THE PAPER | Semantic Web and Commonly-used Vocabularies

THE PAPER | Semantic Web and Commonly-used Vocabularies

By Mark I Borkum and Jeremy G Frey

THE PAPER | Semantic Web and Commonly-used Vocabularies

Semantic Web

The Semantic Web is a collaborative movement that argues for the inclusion of machine-processable data in Web documents. The goal of the Semantic Web movement is to convert the information content of unstructured and semi-structured Web documents into a “Web of data” for consumption by both humans and machines. The activities of the Semantic Web movement are coordinated by the World Wide Web Consortium (W3C), and include: the specification of new technologies; and, the exposition of best practice.

The architecture of the Semantic Web, commonly referred to as the “layer cake”, is a stack of technologies, where successive levels build on the capabilities and functionality of prior levels.

Semantic Web Layers

At the base of the stack is the Uniform Resource Identifier (URI)—a string of characters that is used to identify a Web resource. Such identification enables interaction with representations of the Web resource over a network (typically the World Wide Web) using specific protocols.

At the next level of the stack is the RDF—a family of specifications, which collectively define a methodology for the modelling and representation of information resources as structured data.

In RDF, the fundamental unit of information is the subject-predicate-object tuple or “triple”. Each triple encapsulates the assertion of a single proposition or fact, where: the “subject” denotes the source; the “object” denotes the target; and, the “predicate” denotes a verb that relates the source to the target.

In RDF, the fundamental unit of communication (for the exchange of information) is the unordered set of triples or “graph”. According to the RDF semantics, any two graphs may be combined to yield a third graph. Using a combination of URIs and RDF, it is possible to give identity and structure to data. 

However, using these technologies alone, it is not possible to give semantics to data. Accordingly, the Semantic Web stack includes two further technologies: RDF Schema (RDFS) and the Web Ontology Language (OWL). RDFS is a self-hosted extension of RDF that defines a vocabulary for the description of basic entity-relationship models. RDFS provides metadata terms to create hierarchies of entity types (referred to as “classes”) and to restrict the domain and range of predicates. However, it does not incorporate any aspects of set theory, and hence, cannot be used to describe certain types of models. OWL is an extension of RDFS, based on the formalisation of description logics, which provides additional metadata terms for the description of arbitrarily complex entity-relationship models, which are referred to as “ontologies”.

Dublin Core

The Dublin Core Metadata Initiative (DCMI) is a standards body that focuses on the definition of specifications, vocabularies and best practice for the assertion of metadata on the Web. The DCMI has standardised an abstract model for the representation of metadata records, which is based on both RDF and RDFS. The DCMI Metadata Terms is a specification of all metadata terms that are maintained by the DCMI, which incorporates, and builds upon, fifteen legacy metadata terms, defined by the Dublin Core Metadata Element Set, including: “contributor”, “date”, “language”, “title” and “publisher”. 

In the literature, when authors use the term “Dublin Core”, they are most likely referring to the more recent DCMI Metadata Terms specification. Our decision to use DCMI Metadata Terms is motivated by the fact that, today, it is the de facto standard for the assertion of metadata on the Web. Accordingly, metadata that is asserted by our software systems using DCMI Metadata Terms can be easily integrated with that of other software systems.

OAI-ORE

Resources that are disseminated on the Web do not exist in isolation. Instead, some resources have meaningful relationships to other resources. An example of a meaningful relationship is being “part of ” another resource, e.g., a supplementary dataset, figure or table is part of a scientific publication. Another example is being “associated with” another resource, e.g., a review is associated with a scientific publication. When aggregated, these entities and their relationships form a “compound object” that can be consumed and manipulated as a whole, instead of in separate parts, by automated software systems. 

The goal of the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) is “to define standards for the description and exchange of aggregations of Web resources”. The OAI-ORE data model addresses two issues: the assertion of identity for both aggregations and their constituents, and the definition of a mechanism for the assertion of metadata for either the aggregation or its constituents. Our decision to use OAI-ORE is motivated by the fact that, like DCMI Metadata Terms, OAI-ORE is emerging as a de facto standard for the implementation of digital repositories.

SKOS

The goal of the Simple Knowledge Organization System (SKOS) project is to enable the publication of controlled vocabularies on the Semantic Web, including, but not limited to, thesauri, taxonomies and classification schemes. As its name suggests, SKOS is an organisation system that relies on informal methods, including the use of natural language. The SKOS data model is based on RDF, RDFS and OWL, and defines three main conceptual entities: concept, concept scheme and collection. A concept is defined as a description of a single “unit of thought”; a concept scheme is defined as an aggregation of one or more SKOS concepts; and, a collection is defined as a labelled and/or ordered group of SKOS concepts.

In SKOS, two types of semantic relationship link concepts: hierarchical and associative. A hierarchical link between two concepts indicates that the domain is more general (“broader”) than the codomain (“narrower”). An associative link between two concepts indicates that the domain and codomain are “related” to each other, but not by the concept of generality.

SKOS provides a basic vocabulary of metadata terms, which may be used in order to associate lexical labels with resources. Specifically, SKOS allows consumers to distinguish between the “preferred”, “alternate” and “hidden” lexical labels for a given resource. This functionality could be useful in the development of a search engine, where “hidden” lexical labels may be used in order to correct common spelling errors.

As with both DCMI Metadata Terms and OAI-ORE, our decision to use SKOS is motivated by the fact that it is emerging as a de facto standard. Moreover, given its overall minimalism, and clarity of design, the SKOS data model is highly extensible, e.g., the semantic relationships that are defined by the SKOS specification may be specialized in order to accommodate non-standard use cases, such as linking concepts according to the similarities of their instances or the epistemic modalities of their definitions.

This article is an excerpt from a technical paper, titled "Usage and applications of Semantic Web techniques and technologies to support chemistry research", published at Journal of Cheminformatics.

Download the Paper - LINK

© 2014 Borkum and Frey; licensee Chemistry Central Ltd. 
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0)
    Blogger Comment
    Facebook Comment