Thai script is composed of multiple levels of stacking characters. On the base line are the consonants and some leading or following vowels. Then, the upper or lower vowel may combine above or below the consonant. And the stack can then be finalized by tone mark or diacritic.
In summary, a Thai grapheme cluster can be one of the forms:
where:
The first form is simple: just individual character. It's the second form that the shaping rules must deal with.
Primitive typesetting as used in mechanical typewriters and framebuffer consoles is to put characters in fixed vertical levels to prevent overlapping:
And these are the default positions for glyphs in most Thai fonts, where bare combining without shaping still yields a readable, though suboptimal, rendered text.
Some Thai consonants have extra ascender or descender which can overlap the combining character. Some rearrangement needs to be done to avoid this. Based on such rearrangement operations, Thai consonant can be classified into 4 classes:
The general rules are:
This can be summarized as following table:
base \ comb | AV* | BV | T** | {AV}T** |
---|---|---|---|---|
NC | - | - | SD(c) | - |
AC | SL(c) | - | SDL(c) | SL(c) |
RC | - | RD(b) | SD(c) | - |
DC | - | SD(c) | SD(c) | - |
* MAITAIKHU (U+0E47), NIKHAHIT (U+0E4D) and YAMAKKAN (U+0E4) are treated as AV here.
** THANTHAKHAT (U+0E4C) is treated as T here.
where:
c
in the parameter means the combining mark, and b
means
the base consonant.
Before OpenType, some ad hoc shaping solutions had been developed by vendors, all of which were logically the same positioning-by-substitution technique with the same extra glyph sets, but unfortunately with different code point assignments. There were two major extensions, one for Microsoft Windows, and the other for Apple MacOS. The discrimination had barred the fonts for both platforms from being interchanged, as fonts were tied to vendors' rendering engines.
To support those legacy fonts, which still dominate the market at present, rendering engines should know how to access those private glyphs and substitute them properly.
There were 2 sets of each vendor extension, one for 8-bit pre-Unicode fonts, and the other for Unicode Private Use Area (PUA). 8-bit fonts are rare nowadays, but let us mention them here for historical reference.
Glyph | 8-bit Windows | Windows PUA | 8-bit Mac | Mac PUA |
---|---|---|---|---|
Low MAI EK | 0x8B | U+F70A | 0x88 | U+F88B |
Low MAI THO | 0x8C | U+F70B | 0x89 | U+F88E |
Low MAI TRI | 0x8D | U+F70C | 0x8A | U+F891 |
Low MAI CHATTAWA | 0x8E | U+F70D | 0x8B | U+F894 |
Low THANTHAKHAT | 0x8F | U+F70E | 0x8C | U+F897 |
Low-left MAI EK | 0x86 | U+F705 | 0x83 | U+F88C |
Low-left MAI THO | 0x87 | U+F706 | 0x84 | U+F88F |
Low-left MAI TRI | 0x88 | U+F707 | 0x85 | U+F892 |
Low-left MAI CHATTAWA | 0x89 | U+F708 | 0x86 | U+F895 |
Low-left THANTHAKHAT | 0x8A | U+F709 | 0x87 | U+F898 |
Left MAI EK | 0x9B | U+F713 | 0x98 | U+F88A |
Left MAI THO | 0x9C | U+F714 | 0x99 | U+F88D |
Left MAI TRI | 0x9D | U+F715 | 0x9A | U+F890 |
Left MAI CHATTAWA | 0x9E | U+F716 | 0x9B | U+F893 |
Left THANTHAKHAT | 0x9F | U+F717 | 0x9C | U+F896 |
Left MAI HAN-AKAT | 0x98 | U+F710 | 0x92 | U+F884 |
Left SARA I | 0x81 | U+F701 | 0x94 | U+F885 |
Left SARA II | 0x82 | U+F702 | 0x95 | U+F886 |
Left SARA UE | 0x83 | U+F703 | 0x96 | U+F887 |
Left SARA UEE | 0x84 | U+F704 | 0x97 | U+F888 |
Left MAITAIKHU | 0x9A | U+F712 | 0x93 | U+F889 |
Left NIKHAHIT | 0x99 | U+F711 | 0x8F | U+F899 |
Low SARA U | 0xFC | U+F718 | 0xFC | U+F89B |
Low SARA UU | 0xFD | U+F719 | 0xFD | U+F89C |
Low PHINTHU | 0xFE | U+F71A | 0xFE | U+F89D |
Desc-less YO YING | 0x90 | U+F70F | 0x90 | U+F89A |
Desc-less THO THAN | 0x80 | U+F700 | 0x80 | U+F89E |
Modern fonts should no longer rely on PUA, as modern software begins to support OpenType more widely.
Please refer to Spec for Thai OpenType Font Creation for the details.
Copyright © 2012 by Theppitak Karoonboonyanan. All right reserved.