Wednesday, November 23, 2011

Writing Hindi for Internet / Emails

Writing in Hindi for internet consumption has been very painful -- so much that Hindi speakers/ writers have been left far behind in adopting internet as a medium of communication.

Most of these writers are comfortable typing with the remington style keyboard. This style of typing was adopted by the Kruti Dev and Devlys fonts. But these fonts are basically modified glyphs - which render properly on printed medium. They provide a key-mapping of hindi glyphs over the english keyboard.

E.g. when you type 'f' on your english keyboard, the matra '' is rendered pictorially, and when you type 'o' the consonant 'व' is rendered pictorially. If both these keys are pressed one after the other - fo - वि gets displayed. But note that this is all pictorial representation. This works because you have the font installed locally. 

However, as many of the hindi typists found out, the problem comes if you want to send the text that you have written using one of these fonts over internet. At that time the messages show the underlying english text and becomes meaningless. I.e. if you wanted to send 'विच्छिन्न' what gets sent is fofPNUu since that's the key-sequence that needs to be typed on an english keyboard using one of those fonts.

The solution to the above problem is to use characters in the unicode range 0x0900-0x097F which are the world-wide standards for displaying hindi characters. 

But just using the corresponding characters does not solve the problem. If we were to do that with 'विच्छिन्न' and just put the corresponding unicode characters in sequence we would get . It turns out that in addition to the character representation in the unicode range, the unicode consortium also prescribed a particular sequence for character composition. I.e. if one wants to get वि, one should type the unicode for व first and then the unicode  for  . This is different from what the typists are used to.

There are other such prescribed sequences - especially dealing with half-consonants. These half-consonants could be typed with a single key-stroke on hindi typewriters so they were atomic. But these are no longer atomic with unicode, instead one needs to compose these half-consonants with two unicode characters - one for the full consonant and a second character for the half-character marker (halant). Composing these half-consonants with is a sure recipe for problems since in such cases the unicode corresponding to must come 'between' the corresponding consonant and the unicode for halant. 

As a result of such mismatches, the hindi typists have had an uphill struggle while trying to communicate on internet. 

Now a bit of history on how the situation has come to this: historically the remington style mechanical typewriters were the first typewriter keyboard layout to become popular at a mass scale. Originally these were created for english with QWERTY as the keyboard layout. The idea behind the keyboard layout was to prevent the mechanical arms from jamming into each other when the typist pressed keys very quickly (and thus typing fast).

With time, the same idea was then adopted for hindi typewriters, resulting in the remington style typewriters.

When adopted to computers the english keyboard maintained its QWERTY layout because it was already in widespread use.

Similarly, when adopting a hindi keyboard layout for use with computers - the Kruti Dev style became very popular. It was already popular with the typewriter user community and it almost provided the same experience when performing offline word-processing tasks on computers.

On the other hand, the unicode committee for hindi provided its own recommendations on how the characters and words should be composed. These recommendations are at wide variance with the way the remington typewriters have been used for composing words in hindi.

To address the differences between the recommended unicode character and word composition methods and the methods used by the remington style typewriters, a few other keyboard layouts have been proposed. Inscript keyboard layout is the most notable of these since this is a *standard* keyboard layout being promoted by a few standards bodies in India.

But the inscript keyboard layout has not seen wide adoption since it alters the habits of the users already comfortable with a keyboard layout. It seems that touch typing is a hard skill to learn and once learnt, it is very hard to change.

Now - knowing that the remington style keyboard is the most popular keyboard layout we tried to solve all the problems associated with character sequence mismatches between the remington style keyboard layout and unicode standard. The tool provided at works through all the known composition sequence mismatches between the remington style keyboard sequences and the way unicode standard expects them to be - and provides the users a seamless experience of typing -- as if they were typing on their own comfortable environment. 

Hope this proves to be a small step in helping Hindi become more popular on the internet.

[Note: At some places I have had to use the an images for 'chhoti-e' and incorrectly composed hindi words since the corresponding unicode sequences do not render correctly on the browser]

No comments:

Post a Comment