By Rahul Guhathakurta A true form of “News” was never meant to be just a piece of information, but in fact it has been always base...
By Rahul Guhathakurta
A true form of “News”
was never meant to be just a piece of information, but in fact it has been
always based upon reader’s affiliation towards one or more particular causes
and its consequent effects. If we look in to the past, News content
distributors like Thomson Reuters, Associated Press, former ITAR-TASS were
always into the business of creating information substances delivering a
country’s objectives and eventually portraying ideological make-up of its own
people. Soon, Governments and Revolutionaries understood the power of news and its
various forms which gradually became a very integral part of domestic and international
propaganda system, directly serving a nation’s primary objective under various controlled
information release format.
With the start
of digital revolution through the medium of Internet, the penetration and distribution
of controlled as well as uncontrolled contents among the masses became more
prominent. Overtly speaking, we are at the cusp of very smart content revolution
era. Earlier, the news syndicates had the upper hand that we used to pay a
hefty price to syndicate contents or source contents. Now, we have News Search
Engines, armed with news crawler bots which are controlling very distribution
of our content at particular region of world through embedded geo-targeting codes
and through search engine optimized keywords which mostly influences the web
hits at particular article page from one particular city, region or country. With
the rise of mobile Internet penetration – raw news are easily available over
the web the form of images and videos – the only cost which we are incurring is
understanding that piece of information, analyzing it, authenticating it and
finally structuring it for the end readers. The delivery of news and its allied
contents are now happening sub-consciously through various social media
networks and news aggregators under the concept of “Virility Factors”. Currently,
the most important question is about the survivability of news syndicates as
organizations which completely depends on their capacity to deal with these
heterogeneous data in a proficient way and of course, it’s a time to re-invent
the whole business wheel.
The Rise of Semantic Web:
Next generation digital news industry is based on the assumption that consumers, actually mainly viewers, will become into participants [1]. This fact implies the need for interactive devices, content adaptation, and management for new distribution channels. The need for a new architecture and structures for content management is important for the digital news industry. The aim is to improve knowledge management and information retrieval.
The Concept of Obsoledge:
The Rise of Semantic Web:
Next generation digital news industry is based on the assumption that consumers, actually mainly viewers, will become into participants [1]. This fact implies the need for interactive devices, content adaptation, and management for new distribution channels. The need for a new architecture and structures for content management is important for the digital news industry. The aim is to improve knowledge management and information retrieval.
Content
generation routines, a creative process that started its digitalization a few years
ago, involve people and it is very important to track the impact that this
change (to digital news) will have on them. Not to mention that the archive
system can be used for search and retrieval functions.
The Concept of Obsoledge:
It was short,
but now it’s got shorter. Most endeavors are focused on developing a learning curve by
converting the knowledge from experts into a structured web of information. The
outcome is an innately exhausting as well as not at all enticing. More
accessible technology platforms like smart phones on 3G/4G networks are causing
an explosion of information. This has the effect of making the shelf-life of
knowledge shorter and shorter. Alvin Toffler has – in his book Revolutionary Wealth – coined the term Obsoledge
to refer to this increase of obsolete knowledge.
Fallouts:
Understanding The Copyright Clauses with respect to Semantic Contents:
Algorithm Based Content Curation, Just an Auxiliary Support:

OpenCalais, a
Thomson Reuters tool launched in 2008 based upon Natural Language Processing (NLP) and machine
learning algorithms which can examine any news article, understand what it’s
about, and connect it to related media. This is more
than a simple keyword search. OpenCalais extracts “named entities,” analyzing
sentence structure to determine the topic of the article. It is able to
understand facts and events. For example, when fed a short article about a refugee
crisis forming near Hungarian border, an OpenCalais demo tool recognized
locations, facilities like Refugee Camps and an even occupation like “border
police”. It also understood facts, synthesizing a subject-verb-object phrase to
express that a UN refugee monitoring cell had released various press release on
Syrian Refugees and their transition path through out the European Continent. OpenCalais has
already been put to work at a wide range of news organizations, including The
Nation, The New Republic, Slate, GulfNews and Aljazeera. Each site’s
implementation is unique; for example, DailyMe uses semantic data to monitor
each user’s reading habits, presenting the user with personalized reading
suggestions. Both The
Nation and The New Republic saw immediate benefits to the use of OpenCalais,
According to Thomson Reuters; the tool very much coincided with significant gains in time-on-site, and it
automatically generates pages dedicated to a single topic, which had been a
labor-intensive process for editors.
Defining the Ontological Infrastructure:
As a result of applying the XML Semantics Reuse methodology, we have obtained a
set of ontologies that reuse the semantics of the underlying standards, as they are
formalised through the corresponding XML Schemas. All the ontologies related to
journalism standards, i.e. NewsCodes NITF and NewsML, are available from the
Semantic Newspaper site. The MPEG-7 Ontology is available from the MPEG-7
Ontology site.
The ontologies that are going to be used as the basis for the info-structure of the
semantic newspaper are:
- NewsCodes Subjects Ontology: an OWL ontology for the subjects’ part of the IPTC NewsCodes. It is a simple taxonomy of subjects but it is implemented with OWL in order to facilitate the integration of the subjects’ taxonomy in the global ontological framework.
- NITF 3.3 Ontology: an OWL ontology that captures the semantics of the XML Schema specification of the NITF standard. It contains some classes and many properties dealing with document structure, i.e. paragraphs, subheadlines, etc., but also some metadata properties about copyright, authorship, issue dates, etc.
- NewsML 1.2 Ontology: the OWL ontology resulting from mapping the NewsML 1.2 XML Schema. Basically, it includes a set of properties useful to define the news structure as a multimedia package, i.e. news envelope, components, items, etc.
- MPEG-7 Ontology: The XSD2OWL mapping has been applied to the MPEG-7 XML Schemas producing an ontology that has 2372 classes and 975 properties, which are targeted towards describing multimedia at all detail levels, from content based descriptors to semantic ones.
In addition to
content-based metadata, there is context-based metadata. This kind of metadata
higher level and it usually, in this context, related to journalism metadata.It
is generated by the system users (journalist, photographers, cameramen, etc.).
For instance, there are issue dates, news subjects, titles, authors, etc.
This kind of
metadata can come directly from semantic sources but, usually, it is going to
come from legacy XML sources based on the standards’ XML Schemas. Therefore, in
order to integrate them, they will pass through the XML2RDF component. This
component, in conjunction with the ontologies previously mapped from the
corresponding XML Schemas, generates the RDF metadata that can be then integrated
in the common RDF framework.
This framework
has the persistence support of a RDF store, where metadata and ontologies
reside. Once all metadata has been put together, the semantic integration can
take place

In Early 2010, Evri.com started as a search engine and eventually migrated into content curation services through its patented semantic technologies. Backed by Microsoft Co-founder Paul Allen, the company didn't survive for more than two years and eventually rounding up its commercial activities by the end of 2012. Evri.com itself incorporated the
semantic indexing technology from its acquisition of Radar Networks into
its products line. Unfortunately, Radar’s old existing product, Twine was being shut
down in May 2010. End-Readers in the market are like multiple noses, the question is not how you hold these noses but how long you can the hold the noses.
Understanding The Copyright Clauses with respect to Semantic Contents:
The user‐generated
semantic content can be represented through tag ontology as we discussed above which
can be used to represent tagging data at a semantic level using Semantic Web
technologies like OpenCalais Tool.
Social Semantic Cloud of Tags can eventually improve the expressive
knowledge representation and that multiple ontologies can aid in describing copyright
metadata using some extended properties like enabling the republishing,
exchange, and reuse of tagging data, and will provide a way to reduce the risk of
copyright infringements in the process of tag sharing within the multiple information
layer through Internet on multiple devices.
The copyright
domain is a complex one and conceptualizing it, is a very challenging task. The conceptualization
process, as it has been shown in the pattern description, is divided into two
phases. The first one concentrates on the static aspects of the domain. The
static aspects are divided into two different sub models due to its complexity.
First, there
is the creation sub model. This model is the basis for building the conceptual
models of the rest of the parts. It defines the different forms a creation can
take, which are classified following the three main points of view as proposed
by many earlier discussed ontologies, e.g. the Suggested Upper Merged Ontology :
• Abstract:
Work.
• Object:
Manifestation, Fixation and Instance.
• Process:
Performance and Communication.
A part from
identifying the key concepts in the creation sub model, it also includes some
relations among them and a set of constraints on how they are interrelated.
More details for this point and the following steps in the conceptualization
process are available from
Second, there
is the rights sub model, which is also part of the static part model. The
Rights Model follows the World Intellectual Property Organization (WIPO)
recommendations in order to define the rights hierarchy. The most relevant
rights in the Digital Restrictions Management (DRM) context are economic rights as they are related to productive
and commercial aspects of copyright. All the specific rights in copyright law
are modeled as concepts. For the economic aspects of copyright there are the
following rights: Reproduction, Distribution, Public Performance, Fixation, Communication
and Transformation Right.
Each right
governs a set of actions, i.e. things that the actors participating in the
copyright life cycle can perform on the entities in the creation model.
Therefore, it is time to move to the dynamic aspects of the domain. The model
for the dynamic part is called the Action Model and it is built on the roots of
the two previous ones. Actions correspond to the primitive actions that can be
performed on the concepts defined in the creation sub model and which are
regulated by the rights in the rights sub-model. For the economic rights, these
are the actions:
- Reproduction Right: reproduce, commonly speaking copy.
- Distribution Right: distribute. More specifically sell, rent and lend.
- Public Performance Right: perform; it is regulated by copyright when it is a public performance and not a private one.
- Fixation Right: fix, or record
- Communication Right: communicate when the subject is an object or retransmit when communicating a performance or previous communication, e.g. a re-broadcast. Other related actions, which depend on the intended audience, are broadcast or make available
- Transformation Right: derive. Some specializations can be adapted or translated.
Conclusion:
At IndraStra, we
believe this area is quite alluring for experiencing these encounters in the field
of “Smart News Generation and Distribution”. We expect that the utilization of
upcoming technological advances in Semantic Web field for news content curation
will deliver sure shot advantages for the process automation of contents along with search and recuperation. But, as a publisher we believe deploying these new school methods but not by sacrificing the old school ideologies. At the end, the content is the king and queen, a fierce battle will be fought between who publishes and who curates the best. The final outcome will be a bloodbath if there is no synergy between multiple players with primary focus on the evolution of readership habits based upon various affinity factors.
About The Author:
Rahul Guhathakurta is the Founder and Curator of IndraStra.com and can be reached at his LinkedIn Profile. Thomson Reuters ResearcherID : K-4094-2015
About The Author:
Rahul Guhathakurta is the Founder and Curator of IndraStra.com and can be reached at his LinkedIn Profile. Thomson Reuters ResearcherID : K-4094-2015
References:
1. Ontological
Infrastructure for a Semantic Newspaper -Roberto GarcÃa, Ferran Perdrix, Rosa
Gil Departament d'Informà tica i Enginyeria Industrial,, Universitat de Lleida, Jaume
II 69, E-25001 Lleida, Spain - http://image.ntua.gr/swamm2006/resources/paper07.pdf
2. An experience with Semantic Web technologies in the news domain - Luis S´anchez-Fern´andez1 , Norberto Fern´andez-Garc´Ä±a1, Ansgar Bernardi2, Lars Zapf2, Anselmo Pe˜nas3, Manuel Fuentes4 1 Carlos III - http://ceur-ws.org/Vol-155/paper2.pdf
3. Contracting and Copyright issues in Composite Semantic Services - Christian Baumann, SAP AG , SAP Research / Email: ch.baumann@sap.com - http://link.springer.com/chapter/10.1007%2F978-3-540-88564-1_59#page-2
2. An experience with Semantic Web technologies in the news domain - Luis S´anchez-Fern´andez1 , Norberto Fern´andez-Garc´Ä±a1, Ansgar Bernardi2, Lars Zapf2, Anselmo Pe˜nas3, Manuel Fuentes4 1 Carlos III - http://ceur-ws.org/Vol-155/paper2.pdf
3. Contracting and Copyright issues in Composite Semantic Services - Christian Baumann, SAP AG , SAP Research / Email: ch.baumann@sap.com - http://link.springer.com/chapter/10.1007%2F978-3-540-88564-1_59#page-2
AIDN: 001-10-2015-0348
Image Attributes: Cover Art - Simplistic example of the sort of semantic net used in Semantic Web technology, February 3, 2010, Wikimedia Commons [Link]
Image Attributes: Cover Art - Simplistic example of the sort of semantic net used in Semantic Web technology, February 3, 2010, Wikimedia Commons [Link]