Thursday, August 18, 2011

Semantic Web - when machines talk

A difficult topic I agree, but assure you is really interesting if you want a nice afternoon sleep ;). Just kidding, the topic is really deep & interesting and is something that I may not be able to cover at length in one post, although I will try to keep pace with everyone' understanding and will also try to detail the points as far as I am aware of them.
So on a preface, I came across this topic when one of my colleague asked me about this and wanted to discuss this. I agree I had to read a lot as it was my first with this area and will share all links that I came across and found useful and understandable after I am done with my understanding here.
So, lets dive in...

What is this 'Semantic Web' ?
In plain simple terms, Semantic Web is a 'web of data' that helps machines understand the meaning of any information that exists on the web. You may argue that the World Wide Web(www) is that as well? Yes, it is. To elaborate, www is data that is in itself complete (confined within the bounderies of the application that generated this data) but not linked and more importantly this data (or rather documents) can only be interpreted by humans. Semantic Web links one application's data with the other application' data so in theory we do not have clusters of data generated by the applications but one single web of data or database where machines would be able to relate an A to B and deduce if A is part of B or not.
Ideally this will be a movement from documents to data!!

Confusing ? I am sure it is... read on!

When we search something on internet, the systems just present us with the search results without understanding the relevance or connection in a given situation/circumstance. So you may search for 'available flights to Mumbai' and there may be a result about how a bird can fly or about how an Ostrich can't fly. That's WWW as we know it today.
Semantic Web (or more precisely the Semantic Web agents), on the other hand will provide with data based on the relevance. So the 'available flights to Mumbai' will return you with data like, cheapest that suit your time zone.
Imagine a scenario where you would like to book a meeting with one of your clients a couple of days from now on. So all you need to do is tell your Semantic Web agent to book the cab on such a such date for a 3 pm meeting. The agent will book the Cab for you making sure the cab agency is the closest to your pick up place, also make sure that the cab arrives half an hour in time, check your calender for any conflicts and let you know if there is any, update your financial accounts with the charges, book a return cab for you... . All this without your intervention. The agent however, had to find and combine the information from multiple sources, interpret and take decisions. 
This is what we expect the Semantic Web would be able to bring to us in future on its realization.

Semantic Web is not separate from www, it is part of it and more precisely an extension of www. It's no more about the links on the web, its about linking between these data links so machines are able to analyse the data themselves without any human intervention. 
Understand that Semantic Web is a vision to dissolve the application boundaries and let there be one web complete of all data. 

Web 2.0 is not Semantic Web
So if Web 2.0 is not Semantic Web than what is and how is it different? Well, for one..Web 2.0 is not Semantic Web because Web 2.0 is actually about people. It is focused on the interaction done between people on web, or rather it drives the interaction between people on web. It focuses on the human interaction, its more about people' ability to collaborate the data and share it on the web and thus using the technologies like AJAX, XHTML, SOAP etc... All that people care about is the final result which is the social communication/interaction.

Relation to Web 3.0
Semantic Web is also used as a synonym for Web 3.0. But that remains a topic of debate, as some believe that Web3.0 will be the next revolution and will be more graphical in nature where it will be computers that will be generating more informational data.
As it is, Semantic Web is beyond humans. Its about machines interaction and analytic deduction of data on their own. It is focused on machines; when machines would be able to find, collect, analyze and act on data without any human dependency. So the extension of Web documents to data and metadata will enable the whole Web to be processed by humans and more interestingly by machines independently of humans.
To accomplish the task of converting the web documents to data Web 3.0 introduces some technologies like RDF. More precisely RDF (Resource Description Framework, described later in here), which works on web pages..applications and also databases, is used to turn basic Web data in to structured data that the applications/software can further use.

Semantic Web Technologies
Implementing Semantic web requires adding metadata (data that describes data) to information resources. So when there is enough metadata available, systems will be able to effectively process the data based on this available information.
As an example, the metadata tags available in the html files today helps search engines deduce the rankings and collect information about the type of data the document may contain.
The first step required is to get the data in uniform formats so machines don't end up with the case where at one place the first name is understood by <fname> tag and at other its the <firstname> tag. Semantic Web helps in here to make the data in the same format and also understandable not only to humans but to machines as well.
This is where technologies like RDF, RDFS and OWL come in to picture as they classify data from multiple domains based on their properties and relation to other data.

Resource Description Framework (RDF)
RDF is an XML based standard for describing resources that exists on the web (including intranets). It builds on XML and URI(Uniform Resource Identifiers) technologies, where it uses a single URI for identifying a resource and multiple URIs to analyze the data statements.
RDF statements are often referred to as 'triples'. A triple consists of:
- a subject (basically a resource)
- a predicate (a property)
- an object (property value)
For example:
The blogger site       is      
(subject)           (predicate)            (object)

RDF is normally expressed in an XML format and sometimes graphically as well.

And then we could associate other triples with this triple as well. Once this definition is done, we could code this in RDF/XML as well.
Creating a triple like this sets up a model where data is actually stored, so RDF describes the model and the syntax defining a resource. What RDF doesn't do is specify the semantics (meaning) of the resource. For that we need a schema and a language. Welcome RDFS and OWL !

Resource Description Framework Schema (RDFS) 
RDFS creates the much needed vocabulary that describes groups of related RDF resources and the relation between these resources. A vocab defines the allowable properties that could be linked to the resources in a particular domain. RDFS benefits us in creating classes of resources that may share common properties.
Likewise as in case of triple example provided above, RDFS triples consists of classes, its properties and its values that defines the classes and its relation to resources.
So in a RDFS vocab, resources are defined as instances of classes. A class is also a resource and any class can be a subclass of any other class. This hierarchical meaning (or semantic) information  allows machines to infer the meaning of a resource based  on its properties and property values.
RDFS is a simple vocab language and OWL builds on it to be much more richer.

Web Ontology Language (OWL)
building upon RDF and RDFS, OWL describes the type of relationships that can be expressed in RDF using XML vocab to indicate the structure and relations in between different resources. This is what defined an Ontology.
Semantic Web ontologies contains different sets of inference rules from which machines would be able to take the meaning and logical conclusions.
Taxonomy is nothing but a system of classifications. This classification expresses the hierarchy and the relations that exist between different resources and thus we could use OWL to assign properties to classes of resources and allow the sub-classes of these classes to inherit these properties as well.
So considering that 'Apple' is a 'Fruit' and a 'Fruit' is 'Eatable' a Semantic Web agent could infer that an 'Apple' is 'Eatable'.
OWL in itself has 3 sub-languages each with increasing complexity and each latter comprising of the earlier in below order:
- OWL lit
- OWL Full

Will this be a reality ?
Critics will always be there to question the basic feasibility of the idea. And in a way there are right to question it as well, people lie and may provide wrong metadata, which is basically the backbone on which the idea (will) work.
W3C will obviously argue on this!! However, just to complete the argument, the problem remains with the usage as the Semantic Web idea (though initially conceived by Sir Timothy Berners Lee) is being developed by people who know logical computation and AI.
Looking at the technological work we may soon see a Semantic Web agent at work, fingers crossed!!

More stuff to read on
Here is some more stuff to read. I personally referred to some of them in the build up to write this post.

Phew, yes that was some confusing topic! I agree with that totally. If you have any questions feel free to put them in comments and I may be able to answer them somewhat and may provide you with a link or two in here.

In case you need one, here is the QR for this post: 
Semantic Web QR

1 comment:

  1. Nice Write up, but how can Web 3.0 be used to promote a blog?

    Any suggestions?


Get widget

Twitter Bird Gadget