This page is created for discussion on how to handle Tai Tham Mai Kang Lai in text rendering engines.
By definition, Mai Kang Lai (U+1A58) is a combining character representing final NGA that is subjoined by the base consonant it is placed on.
However, this is practiced differently by different schools, causing the
different shifting (Lai
(ไหล) as its name implies) behaviors.
According to a discussion thread, there are at least 3 different behaviors:
Always Shifted. Mai Kang Lai is strictly placed on the second consonant without exception. This seems to be traditional writing system.
It is practiced by Lao Tham and traditional Lanna.
หนังสือค่าวซอเรื่องจันทฆา พิมพ์เมื่อ พ.ศ.2480,Illustration from การอ่านจารึกสมัยต่างๆ (Thai Palaeography) FL 344 by ศ.ธวัช ปุณโณทก, Ramkhamhaeng University, pp.162
Never Shifted or Pseudo-Shifted. Mai Kang Lai is placed on the first consonant, probably shifted to the right, but logically, it is next to the first consonant. This can be distinguished from the shifted case by words with leading vowel next to Mai Kang Lai, such as สงฺโฆ and องฺเชิญ.
This is found in Khuen and some Lanna books.
Conditionally Shifted. Mai Kang Lai is shifted to the second consonant, except when some signs prevent it, such as MEDIAL RA or above vowel. Examples of such exceptions are รงฺสี, สงฺกรานต์.
This is found in some Lanna books, namely, Maefahluang dictionary of Northern Thai. Those Lanna books for behavior 1 do not provide examples to clarify these cases, though.
But a Lao Tham counter-example (by อ.ยุทธพงศ์ มาตย์วิเศษ above) confirms that Mai Kang Lai is still shifted even in these cases.
In summary, Lao and Khuen are clear cut, while Lanna is fuzzy. Mai Kang Lai is always shifted in Lao Tham, and never shifted in Khuen. For Lanna, it can be either of the three, depending on style.
There are two major proposed approaches to the problem:
Use different encodings. The shifted version should
be encoded by using SAKOT (U+1A60) between Mai Kang Lai and the
following base consonant. For example, สงฺโฆ (Sangkho) should be encoded
as
<HIGH SA, MAI KANG LAI, SAKOT, LOW KHA, VOWEL E,
VOWEL AA>
. For non-shifted version, just omit SAKOT.
The argument for this scheme is the different spelling forms of the same word.
Possible solutions:
Problem: According to
ISO/IEC JTC1/SC2/WG2 N3207R (PDF),
the spelling of the word ทั้งหลาย
(Tang lai) with Mai Kang Lai, i.e.
<LOW TA, MAI KANG LAI, SAKOT, LA, VOWEL AA, SAKOT, LOW YA>
is bound to have Mai Kang Lai followed by SAKOT + LA for the
subjoined LA. The encoding scheme above would cause LA to be full base
form, which is wrong.
Use the same encoding. Both the shifted and non-shifted versions should be encoded the same, and let the font determine how to render.
The argument for this scheme is that the sign placement does not change the meaning of the word. It is rather a matter of styles.
Possible solutions:
'rphf'
like
Myanmar Kinzi.
Fonts for Lao Tham and the shifting school of Lanna should
provide the 'rphf'
feature to signal the rendering
engine to reorder Mai Kang Lai.
Problem: The complexity is on the difference between
behavior 1 and 3. GSUB rules in the fonts might not be capable of
handling it, and in case of 'rphf'
, the rendering
engine will not have sufficient information to choose between
behavior 1 and 3.