The Semantic Web is a collaborative movement that argues for the inclusion of machine-processable data in Web documents. The goal of the Semantic Web movement is to convert the information content of unstructured and semi-structured Web documents into a “Web of data” for consumption by both humans and machines.
By Mark I Borkum and Jeremy G Frey
Semantic Web
The Semantic
Web is a collaborative movement that argues for the inclusion of
machine-processable data in Web documents. The goal of the Semantic Web
movement is to convert the information content of unstructured and
semi-structured Web documents into a “Web of data” for consumption by both
humans and machines. The activities of the Semantic Web movement are
coordinated by the World Wide Web Consortium (W3C), and include: the
specification of new technologies; and, the exposition of best practice.
The
architecture of the Semantic Web, commonly referred to as the “layer cake”, is a stack of technologies, where successive levels build on the
capabilities and functionality of prior levels.
At the base of
the stack is the Uniform Resource Identifier (URI)—a string of characters that
is used to identify a Web resource. Such identification enables interaction
with representations of the Web resource over a network (typically the World
Wide Web) using specific protocols.
At the next
level of the stack is the RDF—a family of specifications, which collectively
define a methodology for the modelling and representation of information
resources as structured data.
In RDF, the
fundamental unit of information is the subject-predicate-object tuple or
“triple”. Each triple encapsulates the assertion of a single proposition or
fact, where: the “subject” denotes the source; the “object” denotes the target;
and, the “predicate” denotes a verb that relates the source to the target.
In RDF, the
fundamental unit of communication (for the exchange of information) is the
unordered set of triples or “graph”. According to the RDF semantics, any
two graphs may be combined to yield a third graph. Using a combination of URIs
and RDF, it is possible to give identity and structure to data.
However, using
these technologies alone, it is not possible to give semantics to data.
Accordingly, the Semantic Web stack includes two further technologies: RDF
Schema (RDFS) and the Web Ontology Language (OWL). RDFS is a self-hosted
extension of RDF that defines a vocabulary for the description of basic
entity-relationship models. RDFS provides metadata terms to create
hierarchies of entity types (referred to as “classes”) and to restrict the
domain and range of predicates. However, it does not incorporate any aspects of
set theory, and hence, cannot be used to describe certain types of models. OWL
is an extension of RDFS, based on the formalisation of description logics,
which provides additional metadata terms for the description of arbitrarily
complex entity-relationship models, which are referred to as “ontologies”.
Dublin Core
The Dublin
Core Metadata Initiative (DCMI) is a standards body that focuses on the
definition of specifications, vocabularies and best practice for the assertion
of metadata on the Web. The DCMI has standardised an abstract model for the
representation of metadata records, which is based on both RDF and RDFS.
The DCMI Metadata Terms is a specification of all metadata terms that are
maintained by the DCMI, which incorporates, and builds upon, fifteen legacy
metadata terms, defined by the Dublin Core Metadata Element Set, including:
“contributor”, “date”, “language”, “title” and “publisher”.
In the literature,
when authors use the term “Dublin Core”, they are most likely referring to the
more recent DCMI Metadata Terms specification. Our decision to use DCMI
Metadata Terms is motivated by the fact that, today, it is the de facto
standard for the assertion of metadata on the Web. Accordingly, metadata
that is asserted by our software systems using DCMI Metadata Terms can be
easily integrated with that of other software systems.
OAI-ORE
Resources that are disseminated on the Web do
not exist in isolation. Instead, some resources have meaningful relationships
to other resources. An example of a meaningful relationship is being “part of ”
another resource, e.g., a supplementary dataset, figure or table is part of a
scientific publication. Another example is being “associated with” another
resource, e.g., a review is associated with a scientific publication. When
aggregated, these entities and their relationships form a “compound object”
that can be consumed and manipulated as a whole, instead of in separate parts,
by automated software systems.
The goal of the Open Archives Initiative Object
Reuse and Exchange (OAI-ORE) is “to define standards for the description and
exchange of aggregations of Web resources”. The OAI-ORE data model
addresses two issues: the assertion of identity for both aggregations and their
constituents, and the definition of a mechanism for the assertion of metadata
for either the aggregation or its constituents. Our decision to use OAI-ORE is
motivated by the fact that, like DCMI Metadata Terms, OAI-ORE is emerging as a
de facto standard for the implementation of digital repositories.
SKOS
The goal of
the Simple Knowledge Organization System (SKOS) project is to enable the
publication of controlled vocabularies on the Semantic Web, including, but not
limited to, thesauri, taxonomies and classification schemes. As its name
suggests, SKOS is an organisation system that relies on informal methods,
including the use of natural language. The SKOS data model is based on RDF,
RDFS and OWL, and defines three main conceptual entities: concept, concept
scheme and collection. A concept is defined as a description of a single “unit
of thought”; a concept scheme is defined as an aggregation of one or more SKOS
concepts; and, a collection is defined as a labelled and/or ordered group of
SKOS concepts.
In SKOS, two
types of semantic relationship link concepts: hierarchical and associative. A
hierarchical link between two concepts indicates that the domain is more
general (“broader”) than the codomain (“narrower”). An associative link between
two concepts indicates that the domain and codomain are “related” to each
other, but not by the concept of generality.
SKOS provides
a basic vocabulary of metadata terms, which may be used in order to associate
lexical labels with resources. Specifically, SKOS allows consumers to
distinguish between the “preferred”, “alternate” and “hidden” lexical labels
for a given resource. This functionality could be useful in the development of
a search engine, where “hidden” lexical labels may be used in order to correct
common spelling errors.
As with both
DCMI Metadata Terms and OAI-ORE, our decision to use SKOS is motivated by the
fact that it is emerging as a de facto standard. Moreover, given its
overall minimalism, and clarity of design, the SKOS data model is highly
extensible, e.g., the semantic relationships that are defined by the SKOS
specification may be specialized in order to accommodate non-standard use
cases, such as linking concepts according to the similarities of their
instances or the epistemic modalities of their definitions.
This article is an excerpt from a technical paper, titled "Usage and applications of Semantic Web techniques and technologies to support chemistry research", published at Journal of Cheminformatics.
Download the Paper - LINK
© 2014 Borkum
and Frey; licensee Chemistry Central Ltd.
This is an Open Access article
distributed under the terms of the Creative Commons Attribution License
(http://creativecommons.org/licenses/by/2.0)