Encoding Tai Noi Script

Introduction

Tai Noi script, also called Lao Buhan (literally Ancient Lao) in Lao PDR, is an old script that has evolved into contemporary Lao script. It was formerly used in all non-religious documents in Lao PDR and North-Eastern (Esaan) Thailand, and is being revitalized for contemporary use in Esaan provinces.

Old palm leaf and paper documents, including historical records, literatures, pharmaceutical information, have been conserved for study and continuation. This also makes it worth computerization.

Characters

Tai Noi script covers almost all contemporary Lao characters, plus some now-obsolete characters. Therefore, it is reasonable to extend Lao Unicode range (U+0E80..U+0EFF) to cover it.

In principle, Lao alphabets in Unicode can be mapped to those of Thai in one-to-one manner, with some unassigned slots for characters that are absent in Lao. It is tempting to use these slots for Tai Noi additions. But this should be avoided, because Lao used to have characters in these slots in a period of time. They can be of potential use upon some requirements in the future. Furtheremore, leaving them intact allows implementations to be easily borrowed between Lao and Thai scripts. Hence, we should use other free slots instead, such as U+0EBA, U+0EBE..U+0EBF, U+0EC5, U+0EC7, U+0ECE..U+0ECF, U+0EDA..U+0EDB, U+0EE0..U+0EFF.

From some studies on Tai Noi tutorials and palm leaf documents, additional characters required can be summarized as follows:

Encoding Proposal

SARA OY

Appropriate slot may be U+0EBE, which is next to SEMIVOWEL SIGN NYO (U+0EBD).

Conjuncts and Subjoins

Possibilities are:

  1. Use Virama, probably by using U+0EBA (corresponding to Thai PHINTHU (U+0E3A)) as Virama, which can be used for joining conjuncts and marking subjoins.

    Rationale: It is flexible for adding new conjuncts and subjoins which may be found. Only one character is needed to be assigned.

    Problems: Introduction of Virama logic in Lao Unicode block, which may take time to migrate legacy systems to.

  2. Assign individual characters, probably using the available slots in the range U+0EE0..U+0EFF, or by allocating a new supplementary block.

    Rationale: It is easy to implement, and is consistent with the two existing conjuncts HO NO and HO MO. The exising Thai-Lao specifications in Unicode can be applied immediately.

    Problems: Exhaustive list of conjuncts and subjoins used in ancient documents is not known. Additional ones can be discovered in more evidences in the future, and the reserved slots can be insufficient.

  3. Assign conjuncts, use virama for Tham subjoins; by mixing above solutions.

    Rationale: The number of conjuncts is relatively low, while Tham-borrowed subjoins are quite arbitrary. So, it is more feasible to fix the list of conjuncts, while leaving the list of Tham-borrowed subjoins open. Assigning code points to conjuncts also makes it consistent with the existing conjuncts HO NO and HO MO.

    Problems: Same as the virama solution above.

  4. Use ZWJ to join conjuncts. ZERO WIDTH JOINER (U+200D) can be used to join two characters into connected form, by which Tai Noi conjuncts can be created. Meanwhile, Tham-borrowed subjoins can be addressed with either choice above.

    Rationale: No new character is needed to define conjuncts.

    Problems: Inconsistent with the existing conjuncts HO NO and HO MO.