This article was aimed more at making you aware of what’s going on, rather than providing a tutorial for building a Semantic Web application.
By Ankit Mathur
There are two aspects to the possible future of the Web; one is that it is definitely going to be more social and more people-centric, but the other important characteristic is that it’s going to be more semantic in nature. The Web technologies that we have available today focus more on how pages are displayed and what content is displayed, rather than on the content itself. The semantic Web is a facet of technology that will allow this content to become meaningful for machines, and will enable them to process this content and help us share, combine and analyse content effectively.
To make it
clear, let me give you the example of Wikipedia. It has all the information in
the world about everything related to rivers, but can I query it to find out
about the longest river in India? I guess not, because I will have to search
for it manually before retrieving that information, and there is no way to get
this data programmatically from Wikipedia in its present form. Enter DBpedia,
and you will be surprised to find that you can do much more than that. During
the course of this article, we’ll look at how this happens.
Why the
Semantic Web?
But why go
through all the hassle, you might ask? Well, natural language processing (NLP)
is one field that could benefit from this. I’m not much of a fan of the Apple
iPhone ecosystem, but we must agree that Siri is really an impressive feature.
Having a machine that can easily interact with humans and answer random queries
will be the next leap in the evolution of computing.
Another example where semantic information might come in handy is in search engines, which can then more effectively derive what the relationships between Web pages are, and understand their context, thus providing better search results that are relevant to the search query entered by the user. Google allows for retrieving such semantic information, as you can see in Figures 1 and 2.
Figure 1
Figure 2
The Semantic
Web stack is something that is more of a theoretical topic, and you’ll
find fat books on the topic. But when it comes to actually implementing
it, it’s a little more challenging than it sounds — and to put it into practice
is a little overwhelming. Of course, we will look at the challenges that it can
face, during the course of this article.
The various
technologies that come under the umbrella of the “Semantic Web” have been under
development, independently, for quite some time. Only after their recent
standardisation have people realised that they are all part of the larger
picture. Of these, the major technologies include:
- RDF (Resource Description Framework)
- RDFS (RDF Schema)
- OWL (Web Ontology Language)
- SPARQL (SPARQL Protocol and RDF Query Language)
These
technologies collectively allow us to add semantic information to the vast
number of existing Web pages. The aim recently has been to build upon the
foundation of the Web as we know it, rather than to replace it altogether,
because the former option is comparatively easier, when you consider the vast
existing Web content that would be difficult to replace.
Resource Description Framework
Resource Description Framework
RDF is a
specification that deals with modeling or representation of data. Each set of
information is represented in the form of subject-predicate-object syntax.
There are a variety of formats in which this is serialised. Some of the popular
ones are:
- RDF/XML
- RDFa
- N3
- Turtle
Again, these
formats are only a subset of the many other notations available, which are
under development. An example of how we would represent semantic data in the
form of RDF/XML, which is the most common format, is as follows:
You can see
that it’s easy to represent information about an article in many notations,
though RDF/XML is a little more verbose if we compare them. But why did RDF
come into the picture, when we could have done the same with XML and schemas?
Well, for one,
XML is too flexible and schemas are too restrictive. When using XML schemas, we
have to define the order and the possible values contained within the elements.
Also, RDF can be used for annotation with documents, and is easily extensible,
which is simply not possible with other technologies.
Resource
Description Framework Schema
RDFS is
basically used to describe properties and classes in RDF, so it is an extension
that builds upon RDF itself. Here, I will quote the ever-popular example of
cats and animals, where we can describe a Tiger, which is a type of animal,
etc. This can be expressed in RDFS as follows:
Ontologies: Web Ontology Language
Ontologies
have often been talked about in computer science books. OWL, or Web Ontology
Language, deals with the representation of these ontologies. But what exactly
are ontologies? These are used to describe a domain of information and develop
relationships between them — for example, classes, properties and their
relationships. These ontologies are used to describe and develop a semantic for
each of the resources available on the Web, from which machine-processable
information can be derived. This is just an enhanced form of what RDFS
provides, and has more features and complexity. The OWL family consists of more
than one sub-language and syntax. The sub-languages are:
- OWL Lite
- OWL DL
- OWL Full
To get a
detailed overview of what OWL entails, you can pick up a book on ontologies and
go through it first.
SPARQL
SPARQL is a
query language used to retrieve information from the Semantic Web stack and its
RDF representation. It is more verbose than SQL, and allows for more complex
queries. There are various types of queries, depending on the usage:
- SELECT query
- DESCRIBE query
- ASK query
- CONSTRUCT query
In most cases, we have a SPARQL endpoint where SPARQL queries are processed, and results are returned in the form of XML, RDF or even HTML. Most queries in many applications can be constructed automatically. An oft-quoted example of a SPARQL query is as follows, which lists all landlocked countries with a population greater than 15 million:
Common Vocabularies
There are a
number of ontologies or vocabularies available online that allow you to
describe common relationships among things, so that we do not need to describe
them again and again. These are like a set of APIs that you can use. Some of
the common vocabularies are FOAF (Friend of a Friend), SIOC (Semantically
Interlinked Online Communities), SKOS (Simple Knowledge Organisation System),
etc. As an example, let us look at FOAF — it allows you to describe
relationships between people and their surroundings.
These are just
some of the technologies that you will encounter while thinking about the
Semantic Web. While developing real-world applications, you might come across
RDF Triplestores, which could allow you to store semantic data and query it
like a database. There are also Web frameworks like Jena, for server-side
technologies like Java, which allow you to develop Semantic Web applications.
These usually provide an abstraction API for all the above-mentioned
technologies like RDF, OWL and SPARQL.
Source: OpenSourceForU