Mai Kang Lai in Tai Tham

This page is created for discussion on how to handle Tai Tham Mai Kang Lai in text rendering engines.

The Problem

By definition, Mai Kang Lai (U+1A58) is a combining character representing final NGA that is subjoined by the base consonant it is placed on.

However, this is practiced differently by different schools, causing the different shifting (Lai (ไหล) as its name implies) behaviors.

According to a discussion thread, there are at least 3 different behaviors:

  1. Always Shifted. Mai Kang Lai is strictly placed on the second consonant without exception. This seems to be traditional writing system.

    It is practiced by Lao Tham and traditional Lanna.

  2. Never Shifted or Pseudo-Shifted. Mai Kang Lai is placed on the first consonant, probably shifted to the right, but logically, it is next to the first consonant. This can be distinguished from the shifted case by words with leading vowel next to Mai Kang Lai, such as สงฺโฆ and องฺเชิญ.

    This is found in Khuen and some Lanna books.

  3. Conditionally Shifted. Mai Kang Lai is shifted to the second consonant, except when some signs prevent it, such as MEDIAL RA or above vowel. Examples of such exceptions are รงฺสี, สงฺกรานต์.

    This is found in some Lanna books, namely, Maefahluang dictionary of Northern Thai. Those Lanna books for behavior 1 do not provide examples to clarify these cases, though.

    But a Lao Tham counter-example (by อ.ยุทธพงศ์ มาตย์วิเศษ above) confirms that Mai Kang Lai is still shifted even in these cases.

In summary, Lao and Khuen are clear cut, while Lanna is fuzzy. Mai Kang Lai is always shifted in Lao Tham, and never shifted in Khuen. For Lanna, it can be either of the three, depending on style.

Proposed Solutions

There are two major proposed approaches to the problem:

  1. Use different encodings. The shifted version should be encoded by using SAKOT (U+1A60) between Mai Kang Lai and the following base consonant. For example, สงฺโฆ (Sangkho) should be encoded as <HIGH SA, MAI KANG LAI, SAKOT, LOW KHA, VOWEL E, VOWEL AA>. For non-shifted version, just omit SAKOT.

    The argument for this scheme is the different spelling forms of the same word.

    Possible solutions:

    Problem: According to ISO/IEC JTC1/SC2/WG2 N3207R (PDF), the spelling of the word ทั้งหลาย (Tang lai) with Mai Kang Lai, i.e. <LOW TA, MAI KANG LAI, SAKOT, LA, VOWEL AA, SAKOT, LOW YA> is bound to have Mai Kang Lai followed by SAKOT + LA for the subjoined LA. The encoding scheme above would cause LA to be full base form, which is wrong.

  2. Use the same encoding. Both the shifted and non-shifted versions should be encoded the same, and let the font determine how to render.

    The argument for this scheme is that the sign placement does not change the meaning of the word. It is rather a matter of styles.

    Possible solutions:

    Problem: The complexity is on the difference between behavior 1 and 3. GSUB rules in the fonts might not be capable of handling it, and in case of 'rphf', the rendering engine will not have sufficient information to choose between behavior 1 and 3.