Digital India: Caught between languages

31 Jul 2014

I was kindly invited by Nikhil Pawha of Medianama to an invite-only event (#NAMAIndic) dicussing the future of indic languages in the digital medium. A lot of thoughts were shared by the audience, some genuinly insightful to some which were a bit off the mark IMO. Overall though, it was much more of the former than the latter and made for some really great post-event reflection.

I think it is time for me to gather my thoughts about the topic, which till now were scattered around my mind. Writing a peice like this gives me an oppurtunity to take all those scattered thoughts about the topic and actually put them together in a somewhat cohesive peice. So here it is.

Whats happened in the last few years?

The last few years have showed a big increase in content in indic languages online. Everyone at the event agreed, right from new publishers to e-commerce sites to advertisers, that in the last few years, indic language content in the digital medium has grown quite a bit. This is still a far cry from the consuption of english content, but the trend is unmistakable.

To really see where we are, and where we could be heading, it is neccesery to know where we come from. Let’s look at the factors which have lead to the increase of indic language content in the digital medium in the last 3-4 years.

  • CSS Web Fonts: This is one of the biggest improvements in indian language content being displayed on the web. I was the first person to show how we could increase the readability and legibility of indic fonts using CSS Web Fonts in web pages back in 2009 and went about developer conferences back in the day evangelizing it. Since that time, a lot of sites with indic content have switched to using Web Fonts using CSS to improve the reading experience. One thing to note is that using CSS Web Fonts requires nothing of the user, the browser takes care of everything. This is unlike earlier solutions which required people to install fonts on their computers to just see the local language content. Many people also used ‘.EOT’ type fonts which only displayed properly in IE and literally just displayed gibberish in every other browser; CSS Web Fonts made this practice virtually obsolete!
  • Better support for Unicode by browsers and awareness by developers: There are still some stark differences in Unicode support between browsers, but over the years browsers have upped their support for Unicode. However, Unicode support for scripts like Devanagari have been there for quite some time. I think a greater influence has been a relatively better knowledge and appreciation of Unicode by web developers over the years. Joel Spolsky’s classic text on Unicode was one of the first great posts on it.
  • Mobile Phones: In India, if you could afford a PC, then the assumption was that you could probably understand english as well - at the very least, if not speak it fluently. However, mobile phones turned this entire assumption on its head. Now pretty much everyone, regardless of how affluent or educated (in english) they were, had a device to access content. Since at least 80% of India now has a mobile phone, and just around 10-15% of India knows English, this meant that people needed a way to communicate in local languages and the demand just grew stronger.
  • Social Media: I’ll cover why social media has resulted in boosting indic content digitally later on in this post, but it is suffice to say that it has indeed played an undeniably important role.
  • Internet speed and growth (and the rise of video): Broadband speeds have increased, but more importantly, mobile data plans have got more afforable. Combined with 3G and now LTE, high speed video consumption is also on the rise. Video is the fastest growing content on the web, and since most indians want to consume videos in their local language, this also has increased the demand for indic language content in digital mediums.
  • Elections 2014: I’m going out on a limb here, but I think the general elections in India in 2014 really was akin to a nitrous boost to indic language content on digital media. This actually ties in with the previous three points. Politcal parties made use of indic language messages to really hammer political messages to social media channels.

The difference between English and Indic Language content in a social context

Actually, it could just be local language content instead of indic language content since I assume the same thing might be true in countries other than india too. My point point is this: In India, English has always been a language used mostly for business and academic environments, whereas indic languages are used more in a social and informal environment. Of course, I’m not the only one who thinks along these lines, and in the recent #NAMAIndic event, I saw more people present the same views too. In a country as linguistically diverse as India, english is also often a common ground to communicate with a large group who may not know (or refuse to speak) any common language apart from english.

The reason social media has boosted indic language content on digital is because social media is generally used by a lot of people to just communicate with people at an informal level. The best way to do that is generally using your local indic language. This is not always the case, but quite often so. A very large majority of jokes forwarded on social media are written in local languages like Hindi, Punjabi, etc. The 2014 general elections also saw political parties spreading messages online in local languages as it conveyed their message in a more informal way and it felt more viceral (Keep in mind that messages like these were sent around even on places like reddit where pretty much everyone knew that the audience knew english).

Whats going on today?

The middle class continues to grow. The thing to note in particular is the rise of whats commonly known as ‘B’ or ‘C’ towns (or ‘Tier 2 and 3 cities’). You can see this everywhere, really. A few years ago, most indian television programs used to focus on people from big cities, but present day indian television seems to focus on smaller towns and even rural areas. The same trend can be seen advertisements, especially for mobile network operators. While I won’t call it saturation, but it’s harder to work for consumer aquisition in metros, and the next fronteer seems to be these smaller places to get the next 100 million customers. The people in these places are also getting hungry for better products and services.

I heard an interesting story from a person from a prominent e-commerce site. When they first introduced the iPad on their store, they waited excitedly for the first orders. The first order came from Jalandar (a relatively small city, but still has a lot of rich people nevertheless). The next order was from a small hamlet in Assam, and the next a little known place in southern India. The team expected orders from the big cities, but here were people ordering from really small places. If you think about it, it really does make sense. Big cities already have plenty of places to buy stuff like iPads, but these small places don’t. However, there is a lot of ‘new money’ in smaller towns, and the demand for things like this has increased. While some of them might be comfortable in English, most of them are not, or have a strong preference for their local language - which makes sense. If you really want to reach them, getting localized content might be a good idea.

Consumption, Creation and CTRs

The digital story is still mainly a consumption driven story, and it is increasing! For example, Malayalam Manorma is claiming 56% YoY growth. Numbers for Jagran are probably similar if not better according to the people I talked to. Consumption of non-news, non-entertainment content has especially grown - the sign of a much more stable and mature readership. Content creation, however, is a different story. Most indic content is either created by publishing houses by paid staff, or forwarded to other people (facebook posts, WhatsApp jokes etc). I think a major factor is the lack of well-known effective tools for writing content in indic languages (even though there a lot of startups in this space in India). For example, you can use Swype to type Kannada easily, but not a lot of people know about it. In social channels you can already see people wanting to write in indic languages, but seemingly not being able to (for example, people writing hindi sentences using english letters). This is somewhat disheartening. Wikipedia in certain local languages have a serious lack of active editors.

One of the good things is that CTRs (Click-through-rates) for indian ads online are generally 2x to 4x the rate of english language CTRs according to the ad execs present at the #NAMAIndic event. This is a really encouraging stat. However, the conversion rate for indic language ads is quite low. This could be because that the landing page after the user clicks an indic language ad is in english, thus confusing, annoying or intimidating him. Depending on what kind of site the user is lead to, this could also be because of the low amount of people in India wiling to engage with online transactions.

Social Media and Local Languages are good friends

As I said above, english has been primarily been used mostly in academic and business settings in India (exceptions exist of course, but this is mostly the trend) or as a bridge language whereas local languages are a best fit for informal, casual conversation.

Since social media like Facebook, WhatsApp, etc are also about communicating in an informal, casual tone with your friends, it is natural that people have been using indic languages more and more in those apps. This also spills over to video, which most of India also likes in indic languages more than english (Take a look at what happened to Star TV back in the day). A more current example is Youtube, whereas english is only the fourth most popular language videos are watched in - after Telegu, Tamil and Hindi.

Udayavani has said that upto 40% of its traffic actually comes via Facebook - a huge portion. Other regional language publications also agree that a major portion of their traffic actually comes via Facebook. This could be because local language content is more easily shared and viewed in a social setting, or also could be because of the way Google indexes local indic content - or maybe because people simply don’t seem to search in indic languages but are ready to visit a site in a local language if it is suggested by a friend.

The future of Indic languages: My wishlist

The event concluded with a few people stating their wishlist of various stakeholders to improve the condition of indic languages on digital platforms. Some were keenly looking forward towards advertisers leading the way, some held high hopes from the government and so on. However, I think we should probably take a look at all segments involved, and see how each can contribute to strengthening indic languages online. Here is my wishlist …

  • Web Developers: The community as a whole should lay a greater emphasis on the significance of localization and i18n. A part of responsive design is (or at least, should be) about making sure your web site is responsive to changes in text length when switching from one language to the next.
  • Designers, Font Foundries … and CDAC: Appealing design is at the heart of a great web experience, and great typography is at the heart of great design. Most web pages online which are using indic content look pretty average, to be honest. We need designers to take a greater interest in how their pages look and interact. We also need font foundries to to design great indic language font faces for better typography. Print and other offline mediums are different than digital ones. A font might look good or be legible in print, but not online. We need to look into getting more indic fonts with an eye towards greater readability and legibility in a digital medium, especially on the web. Also, one often overlooked factor, especially critical to web fonts, is file size. Some asian fonts, for example, can be upto 1 Mb in size - which is extremely bad when used as a web font. Another factor is the license of those fonts, and this is where CDAC and the government in general fares pretty badly. CDAC has made some fonts for indic languages but the licenses for those are so restrictive, that it is useless for promoting good typography for indic languages. So in short, we need a good number of beautifull, pleasently readable indic fonts in a small filesize and in less restrictive licenses. Latin based languages have plenty of them, and we should have it our languages too. The recently designed font ‘Ek Mukta’ by Dr. Girish Dalvi is a step in the right direction.
  • The W3C and other standard bodies: TDIL and by extension, W3C India has been doing work in analysing indic fonts on the web, and preparing a report on how to improve it. They have also sought my help and from time to time, I have provided advice to them on it as well. The problem right now is that if some indic languages are not displaying well on the web, especially with certain CSS properties, it is tricky to determine whether it is something wrong in that particular rendering engine, or something wrong with the CSS specification itself. Unfortunatelty, as I have often disccussed with W3C India as well, most of these issues seem to be rendering engine specific so we need to go ahead and file bugs for each and every rendering engine and push each of them to fix the issue. Not an easy task. However, sometimes it turns out that the problem seems to be from the CSS Specification side, and that is where we need to focus on providing better recommendations to the CSS working group.
  • Browser Makers: As said above, a lot of the issues regarding indic fonts not displaying properly seem to specific to rendering engines. Some of them are open source or have an open process or model (Gecko, Blink and Webkit) and some not so much (Trident, from IE). At this level, you can only file bugs and comment occasionaly and hope someome can fix it soon (or land a patch yourself if you have the expertise). Most people working regularly for rendering engines are swamped already with many other tasks, so unfortunately indic language support is always a lesser priority. The best approach, at least in the case of open source rendering engines, might be to make a patch yourself, but the expertise required to do so is non-trivial to say the least.
  • The Indian Government: A lot of people in the indic computing domain have often complained about the fact that even though the Indian Government has spent a lot of money, time and manpower making some excellent technologies for indic computing, it makes it extremely difficult for commerical vendors to use it, even if the vendors are willing to pay for it. I heard a person give the analogy of a kid going into a shiny toy store and being asked to look but not touch. Some in the industry are of the opinion that the time has come to get regulation or laws in place to ensure that every mobile device sold in india has native support for indic languages by default (just like how it is in countries like China, Pakistan and bangledesh, to name a few). From my interactions with the government, I can say that moves towards such a thing have already started, but a technical analysis of feasiblity of it considering the scope of our country and our linguistic diversity is a complex task when considering stuff like limited hardware and also technical standards going into stuff like SMS. Some of the CDACs and IITs are already looking into the problem.
  • Network Operators: SMS is still big, but I honestly think it’s days as a communication platform between friends and family are numbered. In the future, I expect SMS to only be used by companies giving you alerts about things (like your bank SMS’ing you about a transaction happening through your credit card etc) and the social use of SMS to die down in favour of messaging applications like WhatApp, Line, Facebook etc - and as we know, these social networks really do encourage sharing of content in local languages. We can already see the trend today. What this means is that network operators have an excellent oppurtunity to use this trend and offer attractive mobile data plans to users. I know quite a few people who just have a data plan for these messaging services. Once people have a mobile data plan, and start to use services like WhatsApp, then slowly they start to warm up to other kinds of online content as well (especially if they have a smartphone). There is a fair amount of video content online (like on youtube) that is in indic languages. The growth of mobile data in India will not just be beneficial to network operators, but also to indic language computing. Opera has been pushing for more access to mobile data plans at the grassroots level (for example internet.org and countless other efforts over the years) as its a win-win for everyone involved.
  • Advertisers and Market Research Firms: Indic language ads provide a high CTR and I think it is in advertisers’ interest to look into how to best exploit this fact. On the other hand, advertisers would like more insight into the consumption patterns of people consuming or producing indic language content digitally. This is where market research firms need to step in. It would be great to get an idea of how different (or indeed similar) are people who are producing and consuming indic language content are from people in india who primarily deal with english. A comprehensive study of CTRs (so that we don’t just rely on anecdotal evidence), usage habits (like how much they consume and produce on mobiles vs PC), which kinds of sites which are most viewed, why exactly are conversions low despite high CTRs etc would be something that all people interested in this doman, but especially advertisers, would be keenly interested in.

The one word which everyone in the indic computing domain keeps talking about is ‘Parity’. Parity with english language content and services so that it gives an oppurtunity for the rest of the 85% of India to share and create in their own language online.