Twitter is by far the platform of greatest interest in terms of event detection. Of all the uses of event detection technology, building situational awareness of rapidly developing and chaotic events – especially emergencies – is perhaps of most clear application to counter-terrorism.
By Jamie Bartlett and Carl Miller
(via www.demos.co.uk)
Image Attribute: Twitter Art by FavsCo / Creative Commons
Twitter is by
far the platform of greatest interest in terms of event detection. Of all the
uses of event detection technology, building situational awareness of rapidly
developing and chaotic events – especially emergencies – is perhaps of most
clear application to counter-terrorism. Emerging events are often reported on
Twitter (and often spike shortly thereafter as ‘Twitcidents’) as they occur.
Social media
users (especially Twitter users) can play a number of different roles in
exchanging information that can detect events. They can generate information
about events first-hand. They can request information about events. They can
‘broker’ information by responding to information requests, checking
information and adding additional information from other sources and they can
propagate information that already exists within the social media stream.
Multimedia
content embedded on social media platforms can add useful information – audio,
pictures and video – which can help to characterize events. One crucial area of
development has been to combine different types of social media information
across different platforms. One study used YouTube, Flickr and Facebook,
including pictures, user-provided annotations and automatically generated
information to detect events and identify their type and scale.
Due to the
user generated nature of on social media there is a pervasive concern with the
quality and credibility of information being exchanged. Given the immediacy and
easy propagation of information on Twitter, plausible misinformation has the
potential to spread very quickly, causing a statistically significant change in
the text stream. Confirming the validity of the positive system response is a
crucial step before any action is to be taken on the basis of that output. A
vital requirement of event detection technology is the ability to verify the
credibility of information announcing or describing an event. Some promising
work has been done to statistically identify first-hand tweets that report a
previously unseen story; however it is unclear how that system would perform
with the relatively small amounts of data available in an emergency scenario.
Generally
speaking, untrue stories tend to be short lived due to some Twitter users
acting as information brokers, who actively check and debunk information that
they have found to be false or unreliable. One study, for instance, found that
false rumours are questioned more on Twitter by other users than true
reportage.45 Using topically agnostic features from the tweet stream itself has
shown an accuracy of about 85 per cent on the detection of newsworthy events.
One 2010
paper, ‘Twitter under crisis’, asked whether it was possible to determine
‘confirmed truth’ tweets from ‘false rumour’ tweets in the immediate aftermath
of the Chilean earthquake. The research found that Twitter did tend toward
weeding out falsehoods: 95 per cent of ‘confirmed truth’ tweets, were
‘affirmed’ by users, while only 0.3 per cent were ‘denied’. By contrast, around
50 per cent of false rumour tweets were ‘denied’ by users. Nevertheless, the
research may have suffered a number of flaws. It is known, for example, that the
mainstream media still drives traffic – and that tweets including URL links
tend to be most re-tweeted, suggesting that many users may have simply been
following mainstream media sources. Moreover, in emergency response, there
tends to be more URL shares (approximately 40 per cent compared to an average
of 25 per cent) and fewer ‘conversation threads’.
One important
factor, especially important for situational awareness, is the ability to
identify the geo-spatial characteristics of an event. Many of the techniques
described above to infer the location data of social media content are also
used in the field of event detection.
However, the
reliability of event detection and situational awareness techniques may be
context or even event specific. It appears especially useful in emergency
response where a large number of people have a motive to produce accurate
information. By contrast, one recent (unpublished) thesis analysed the extent
to which useful real time information about English Defence League protests
could be gleaned from Twitter.
In the
build-up to three demonstrations for which data were collected in 2011, most
tweets were negative; and very few were geolocated to the event venue. A very
large number of tweets were re-tweets (49 per cent compared to 24 per cent
during a control period), and on further analysis, a significant proportion of
the re-tweets were negative, inaccurate rumors. Moreover, a very large
proportion of tweets (50 per cent) came from a very small number (5 per cent)
of – usually negative – commentators.49 The recent crowd-sourced effort to
positively ID the suspects in the recent Boston terror attacks on Reddit were
also less successful – although it is not clear whether and how information
gained through the exercise was of use or value to the police.
This is only
one of a number of difficulties relating to the validity and reliability of
data sets. There are now, for example, systematic, highly organised operations
to create fake reviews, although other researchers are using natural language
processing to determine fake reviews from real ones – including verifying an IP
address to determine frequency. Of course, at the very large scale, data can
be widely skewed by automated information bots. Facebook recently revealed that
seven per cent of its overall users are fakes and dupes.
The validity
of large scale data sets partly relies not on the fact that every single data
point will be taken as accurate, but that when aggregated and combined, large
scale data sets can produce valid and robust results – or at least results more
robust than any single, even expert, observation. This is the principle that
‘the wisdom of the crowds’ produces more accurate descriptions than any single
observer when certain conditions are met: diversity of opinion, independence, decentralization, and aggregation. Social media, as a social network, does not
always meet these conditions. One recent study of 140,000 Facebook profiles
looked at the first three months of use and found that new members were closely
monitoring and adapting to how their friends behaved, suggesting that social
learning and social comparison are important influences on behavior. The
2011 London riots were widely discussed – and perhaps partly organised – via
social media networks. It does not appear that Twitter was able to ‘dispel’
misinformation quickly. Indeed, rumors spread rapidly, and although some disagreement was found, they were within different, sealed networks.
This article is an excerpt from a technical paper, titled " THE STATE OF
THE ART: A LITERATURE REVIEW OF SOCIAL MEDIA INTELLIGENCE CAPABILITIES FOR
COUNTER-TERRORISM, published by Demos under Creative Commons license.
Download The Paper - LINK