By ITU/UNESCO Team
Among several major demand side challenges in expanding the Internet and web to accommodate the next four billion people, one major barrier that must be overcome is the representation and use of the world’s languages online. In order to connect everyone, it is also vital to increase the online representation of many of the world’s languages, especially for regions and countries with high linguistic diversity (such as Africa, India and South-East Asia).
Today, only a fraction of the world’s languages is present on the Internet – an estimated 5% of the world’s languages (by number of languages). Linguistic diversity is an essential component of humanity’s living heritage, social inclusion and empowerment, along with sustainable development. Language plays a vital role in the construction and expression of the individual and collective identity, as an intangible – and invaluable – resource which is tough to acquire but, once acquired, easy and rewarding to share.
Over a third of the approximate 7,100 languages spoken today are in danger of disappearing – Ethnologue reports that 1,519 languages are ‘in trouble’ and 915 are classified as ‘dying’, with a rate of loss of six languages per year. At this rate, many languages will disappear in the near future, while others will lose their influence and relevance at global, national and local levels. UNESCO is undertaking a revision of its Atlas of World’s Languages in Danger which already includes nearly 2,500 languages in danger, in order to encompass all of the world’s known languages. The new version will use ICTs to expand the knowledge base of the world’s languages, and provide a feature monitoring ‘language vitality’ status.
Recent research by UNESCO found evidence that the current number of languages represented on the Internet is more than 300. It is very difficult to measure linguistic diversity by checking the number of websites and number of languages, range of information sources etc. There are some ongoing initiatives to measure linguistic diversity on the Internet; however, there is no single reliable, standard way of measuring diversity (which often needs to be done at the regional or even national level, due to the local knowledge needed).
Indeed, the Internet’s content continues to be dominated by a few major languages, most significantly English. According to W3Techs’ survey of the most popular 10 million websites, 55.2% are in English, with Russian, German, Japanese, Spanish and French being used by between 4.0-5.8% of websites.
A significant number of national languages (such as Hindi and Swahili) are used by less than 0.1% of these websites, and most of the world’s languages are not represented at all in their data. The large majority of languages are without a significant online presence matching their real world speaker base.
Another related factor is making existing online services available in more languages (Figure 1). These services may be ‘multinational’, but it is not clear that many of them are ‘multilingual’ in relation to the total ‘language universe’ of between 7,10226-9,00027 languages in existence. Indeed, by measures of multilingualism, Wikipedia has consistently performed well in terms of number of languages over recent years, partly due to its reliance on user-generated content. However, growth in the languages available for some of the main online services is not matching the growth in Internet usage (Figure 1).
Chart Attribute: Multinational online services, but are they multilingual? Number of languages in which major online services and websites are available. Source: ITU, from various sources including Ethnologue
Facebook (2015) measured supply for content in local languages using the number of languages with content exceeding 100,000 Wikipedia pages as a proxy variable for the availability of local content in local languages. Facebook found only 53% of the world’s population has access to significant Wikipedia knowledge (and by extension, online content) in their primary language, and that making the Internet relevant to 80% of the world requires content in at least 92 languages (as opposed to the current 52 languages with >100,000 articles+29).
Another major issue relates to Internationalized Domain Names (IDNs). Historically, IDNs only included a limited number of characters, Latin “a” to “z”, digits “0” to “9” and the hyphen “-”. Many of today’s 3.2 billion Internet users are unable to read or understand Latin text, making the names/ words in domain names either meaningless or difficult to recall. A multilingual domain name environment can help ensure each end-user has the same rights to access content in their own language, and to experience the Internet without constraints.
The UNESCO/EURid World Report on Internationalized Domain Names30 finds that:
- IDNs help enhance linguistic diversity in cyberspace;
- The IDN market is more balanced in favour of emerging economies; and
- IDNs are accurate predictors of the language of web content.
This article is an excerpt from a technical report, titled – “The State of Broadband 2015”, published by ITU and UNESCO. This publication is available in Open Access under the Attribution-ShareAlike 3.0 IGO (CC-BYSA 3.0 IGO) license (http://creativecommons.org/licenses/by-sa/3.0/igo/).