Thai vs Indic Encoding Scheme

I write this page after having heard a lot in several seminars and mailing lists about critiques on Thai (and Lao) encoding scheme that is different from other Indic scripts. Although I can see some valid points in those arguments, I feel overstatement in it. And it seems to need some clarification.

Most often, it's about the Thai "visual" order, versus Indic "logical" order. In Thai, you encode characters from left to right in visual order. So, leading vowel is encoded before the following consonant. But according to Indic encoding scheme, you always encode consonant first, followed by vowel. Both have profits and drawbacks. But they are frequently compared when talking about string collation, where Thai compares consonant before the leading vowel. This has been occasionally overstated as "phonetic analysis", and all the myths are spread. Well, it's not that bad, actually, as you shall see. And I still think the current encoding scheme is appropriate for Thai.

History of Thai Script

King Ramkhamhaeng's script

Later changes

Character Encoding in Computer

Backgrounds

Consequence

→ Indic-style encoding not preferred.

Issues

Input Method

Collation

Word break/line wrapping

Conclusion

Disclaimer

Resources


free html hit counter