I want to develop my Rust library more interactively with Lisp-style read-eval-print loop (REPL). So I wrote this post to show how I used Excvr - yet another REPL for Rust, and inf-excvr - yet another Excvr wrapper for Emacs, step-by-step.
I cloned the project the wordcut-engine project, and open README.md in Emacs.
I ran inf-evcxr, which it created a new inf-evcxr window.
I add the project as dependency by running inf-evcxr-add-dep-current-project.
I marked a region over use commands, and ran inf-evcxr-eval-region.
I marked a region over initialization code block, and ran inf-evcxr-eval-region.
I moved the cursor to let txt = "หมากินไก่"; and I ran inf-evcxr-eval-line.
I moved the cursor to wordcut.put_delimiters(txt, "|"). Then I ran inf-evcxr-eval-line. The REPL printed the result of word tokenizer (wordcut::put_delimiters).
I did the same on wordcut.build_path(txt, &txt.chars().collect::>()). The REPL printed a graph that used for word tokenization.
I could let the REPL show prettier result by evaluating dbg!(wordcut.build_path(txt, &txt.chars().collect::>()));.
I redefined txt to กากินกิน, and ran the word tokenizer again.
This allows me to experiment with and manually test functions by varying the input variables to different strings and checking the results in simple strings or even internal structures, without recompiling the program or reloading the data. This is helpful for manual testing and debugging.
Compile-time type checking is a great way to catch errors early, but it is not a guarantee of correctness. Even simple subroutines can be incorrect. For example, is_old_enough is a subroutine for checking whether a person is at least 21 years old.
Adding an equals sign (=) to the code changes the behavior of the subroutine, even though the code is still type-safe. The similar bug is found in Servo, but the type was integer.
Testing the entire program manually or programmatically is essential, but it can be difficult to catch all errors, especially those hidden in the details. Testing subroutines is important because it allows testers to focus on small, well-defined units of code. This makes it easier to identify and fix errors. Here are three prerequisites for testing subroutines:
Defining subroutines
An input environment for testing
Result validation
Defining subroutines
Some programming languages encourage programmers to define subroutines more than others. This is because some languages have features that make it easier and more natural to define and use subroutines.
Defining subroutines in BASIC programming language
In the 1970s, to define a subroutine in BASIC, you would assign it a line number and use the RETURN statement.
1000 PRINT "SUBROUTINE"
1100 RETURN
We can call a subroutine in a program using the GOSUB command, followed by the line number of the subroutine.
GOSUB 1000
Defining a subroutine in BASIC is as simple as using the GOTO statement, but with the added convenience of being able to return to the calling code.
Defining subroutines in Common Lisp
In Common Lisp, a function is a subroutine that always returns a value when it is called with a specific set of inputs. This Common Lisp code processes a-person, which is a member of the list people one-by-one using the DOLIST command. If a-person is at least 21 years old, the program will print it out.
We can create a new function from the part (> (person-age a-person) 20) by using the DEFUN command, with a function name - old-enough?, and an input variable, which is a-person.
Common Lisp encourages programmers to create subroutines by making it easy to copy and paste parts of code, which are also known as expressions, or forms.
Defining subroutines in Java
Here is a Java version of a print-a-person-if-at-least-21 program. Java uses the for loop instead of the Common Lisp DOLIST command.
In addition to Common Lisp, Java requires type annotations for functions. The function is_old_enough was annotated as a function that takes a Person as input and returns a boolean. Moreover, In Java, programmers must decide whether a function belongs to a class or an object by using the static keyword. In Java, programmers also use the private and public keywords to control access to functions. Java functions always require a return statement, similar to BASIC subroutines, except for functions that do not return any value.
Java encourages programmers to create subroutines, but with more annotations, it is not as encouraging as Common Lisp.
Defining subroutines in Crystal: Static typing doesn't mean more annotations.
My explanation of Java, a statically typed programming language, may have led to the misconception that statically typed languages require more annotations. Crystal - another statically typed programming language is the counter example. Here is a Crystal version of a print-a-person-if-at-least-21 program. Instead of the DOLIST command, Crystal uses the EACH command.
To create a function, we can copy the expression a_person.age > 20, and paste it into DEF ... END block, without any type annotations or any RETURN statement.
defold_enough?(a_person)a_person.age>20end
We can substitute the expression a_person.age > 20 with a function call old_enough?(a_person).
Surprisingly, the Rust version of is_old_enough looks similar to the Crystal version, but with type annotations. Type annotation in Rust is more complicated than in Java because Rust has references and programmers need to think about the lifetime of variables. Type annotations and lifetimes could make it more difficult for programmers to write subroutines in Rust.
Type annotations make definitions precise and easier to read, but they require more work, can be distracting, and do not help encouraging a programming to create a subroutine.
Preparing an environment for calling a subroutine
Some programming language features and software design can make preparing the environment for calling a subroutine difficult. Moreover, maintaining the code used for preparing the environment could require unnecessary work if the code is too coupled with data structures, which are usually changed.
Preparing an environment in Common Lisp and JavaScript
The variable a-person is an environment for calling the function old-enough?. We create a data structure from a struct in Common Lisp by calling a function make-*. In this example, we call a function make-person.
(make-person:name"A":age30)
Moreover, we can make a data structure from a struct using #S syntax, which is in the same form as it is printed.
#S(PERSON:NAME"A":AGE30)
This #S syntax is very useful when we have existing data structures, because it allows us to use printed data structures to prepare the environment later. This is especially helpful when we want to build long or complex data structures, such as a list of 1,000 people.
In JavaScript, we can prepare data structures in a similar way to Common Lisp, but without specifying the types of the data.
{"name":"A","age":30}
Like Common Lisp, JavaScript can dump data structures to JSON format using the JSON.stringify() command.
It is easy to prepare a data structure as an environment for calling Common Lisp and JavaScript functions, especially because we can reuse the format that a data structure was dumped from memory.
Preparing an environment in Java and Rust
In Java, we create a data structure by instantiating a class using the new keyword. The arguments, which are the input values for creating an object, are sent in a strict order without any keywords, such as :name and :age seen in the Common Lisp example. This style should be fine when the number of arguments does not exceed three.
vara_person=newPerson("A",30);
We can call the function is_old_enough, which in Java is a class method.
is_old_enough(a_person)
Alternatively, we can define the function is_old_enough as an object method, and then call it with this syntax.
a.is_old_enough()
Still, the method for preparing the person data structure remains the same. So class methods are not necessarily easier to test than object methods.
In Rust, we create a data structure with the similar syntax to Rust. However, Rust has a more step, which is converting &str to String using the function to_string.
Person{name:"A".to_string(),age:30}
Although both Java and Rust cannot use printed format for creating data structure directly. We can use JSON library to dump and load
data.
So, preparing an environment in Java and Rust is not as convenient as Common Lisp or JavaScript, since we cannot copy printed data structure, and directly use it in the program without a help of an additional library.
The difficulty in preparing the environment is caused by the software design.
Sometimes preparing the environment is difficult because of the software design. To create a Person object in this example, we must pass in the person's name and a service that can return their age.
So, we cannot prepare a person data structure with a specific age without creating a service, which is remotely related to test the function is_old_enough.
Using basic data structure
Instead of defining a class or a struct, we can use a list for representing personal data.
'(:name"A":age30)
Using a list removes unnecessary restrictions on creating a person, even though our design is primarily to get a person from a service. Here is an example of calling a function to obtain a person data structure from a service.
(get-person"A"service)
In JavaScript, we can create an object, which is idiomatic for JavaScript, instead of a list.
{"name":"A","age":30}
In Java, we use HashMap although creating HashMap in Java does not look as concise as list in Common Lisp.
However, using a list or other basic data structure also has a downside, which will be explained later.
Modifying the data structure affects the code for preparing an environment.
Given, we added reward to the struct person.
structPerson{name:String,age:u32,reward:u32,}
This code for creating a Person data structure would be broken.
Person{name:"A".to_string(),age:10}
We have to create a data structure by passing a reward value.
Person{name:"A".to_string(),age:10,reward:800}
It may seem trivial, but I've never enjoyed fixing repetitive code in tests.
Use default values for values we don't care about.
In Rust, we can create a data structure with default values, and then we assigned only a value that we care.
letmuta_person=Person::default();a_person.age=30
Before we use the function default, we put #[derive(Default)] before the struct definition.
We can use a list instead of a specific struct, and in a list, we can put only :age with other values. Still, we can run the test.
(setqa-person'(:age30))(old-enough?a-person)
Using basic data structures has some downsides. Lists and hash tables do not perform as well as structs, because accessing struct member is very fast. The position of each struct member in memory is calculated arithmetically. Moreover, when everything is a list, a compiler cannot help checking types since their types are the same. A programmer may have no idea how the data structure looks like by looking a function definition. Still, we alleviate solve these problems by using a runtime schema such as JSON Schema.
Preparing an environment for async function and database connection is not convenient
Some subroutines need a database connection to establish. Some subroutines need an async event loop to run before testing, for example, async functions in Rust. Preparing a fake database and connecting the everything before testing is inconvenient, especially for testing a function like is_old_enough?, which can be fixed by improving the software design. Testing async functions become easier by using a tool, such as Tokio::test.
Testing a subroutine in the production environment
Testing in the production environment is not preferable, but sometimes it is necessary, especially when we cannot reproduce the problem somewhere else.
Common Lisp can run Read-Eval-Print Loop (REPL) along with the production, so we can always test subroutines. Many languages come with an REPL, but we have to make sure that libraries and frameworks play well the REPL. In Common Lisp community, libraries and frameworks are usually REPL-friendly.
Result validation
After running a subroutine, we usually want to validate the result either manually or programatically.
Programatical validation
Most data comparison functions check if the data is the same object in memory, which is not what we want in this case. The code below does not return true even if the content of the data structures is the same because the EQ function does not compare the content.
When testing, we usually want to compare data structures for content equality. In Common Lisp, we can use the EQUALP function to do this, instead of the EQ function.
Manually validating a complex data structure can be difficult, so there are many tools that can display data in a structured view. In Common Lisp, we can use Emacs inspectors like slime-inspect and sly-inspect, or we can use Clouseau, which is part of McCLIM. For other programming languages, I typically convert data structures to JSON and view them in Firefox.
Public Id As String
Public Title As String
Public Author As String
Public Sub PrintObject()
Print Id, Title, Author
End
Static Public Sub Info()
Print "หนังสือเป็นสื่อ"
End
สัญญาอนุญาตก็คืออนุญาตอย่างน้อยให้ทำตามเสรีภาพที่ควรมีของซอฟต์แวรเสรีคืออนุญาตให้ใช้งาน ทำซ้ำ เผยแพร่ เช่น บางส่วนของสัญญาอนุญาตแบบ Expat/MIT «Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:» จะเห็นว่ามีคำว่า use copy modify publish และ distribute ด้วย
คนทั่วไปยิ่งไม่อยากทำงานให้คู่แข่งในทางต่าง ๆ ได้เปรียบด้วย ซึ่งสำหรับซอฟต์แวร์เสรีการที่คู่แข่งแก้ไขปรับปรุงแล้วเอาขายหรือใช้ในกิจการแต่ไม่เผยแพร่รหัสต้นฉบับ (source code) สู่สาธารณะได้สามารถมองว่าเป็นการเอาเปรียบคนอื่นจนเกินไปได้ สัญญาอนุญาตบางแบบโดย เช่น GNU General Public License (GPL) สร้างมาเพื่อจัดการประเด็นนี้โดยมีข้อความส่วนหนึ่งว่า “You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions:” ซึ่งมีเงื่อนไขให้เผยแพร่รหัสต้นฉบับที่แก้ไขแล้ว ตัวอย่างซอฟร์แวร์ที่ใช้สัญญาอนุญาต GPL เช่น Linux Wordpress VLC Blender ซึ่งเป็นซอฟต์แวร์ที่ได้รับความนิยม และโดยเฉพาะ Linux มีเอกชนหลายเจ้าช่วยกันพัฒนา
อย่างไรก็ตามในยุคที่ใช้งานผ่านเครือข่ายก็มีการอ้างว่าไม่ได้เผยแพร่โปรแกรม จึงไม่ต้องแจกจ่ายรหัสต้นฉบับจึงมีสัญญาอนุญาต GNU AFFERO GENERAL PUBLIC LICENSE (AGPL) ซึ่งมีข้อความส่วนหนึ่งว่า “Notwithstanding any other provision of this License, if you modify the Program, your modified version must prominently offer all users interacting with it remotely through a computer network (if your version supports such interaction) an opportunity to receive the Corresponding Source of your version by providing access to the Corresponding Source from a network server at no charge, through some standard or customary means of facilitating copying of software.” เป็นเงื่อนไขว่าต้องแจกจ่ายรหัสต้นฉบับเมื่อมีนำโปรแกรมที่แก้ไขปรับปรุงไปให้ใช้ผ่านเครือข่ายคอมพิวเตอร์ ตัวอย่างซอฟต์แวร์ที่ใช้สัญญาอนุญาต AGPL เช่น Mastodon Nextcloud OnlyOffice ทั้งหมดเป็นโปรแกรมสำหรับใช้งานผ่านระบบเครือข่าย
สิทธิบัตร
ประเทศไทยยังไม่สิทธิบัตรซอฟต์แวร์ แต่ไม่ใช่ทุกคนอยู่ในประเทศไทยหรือจะอยู่ในประเทศไทยตลอดเวลา จึงต้องคำนึงถึงสิทธิบัตรด้วย เพราะไม่ละเมิดลิขสิทธิ์แต่ละเมิดสิทธิบัตรก็เสียทรัพย์ได้ ดังนั้นสัญญาอนุญาต GPL รุ่นที่ 3 จึงมีข้อความเกี่ยวกับการยุติสิทธิบัตรและค่าสินไหมด้วย «A contributor's “essential patent claims” are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License» เช่น เดียวกับสัญญาอนุญาต Apache รุ่นที่ 2 ก็มีความลักษณะคล้ายกันว่า “Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work”
ส่วนที่ไม่ใช่โปรแกรม
สัญญาอนุญาตที่เป็นที่นิยมใช้กับเอกสาร รูปภาพ วิดีโอ และเพลงโดยเฉพาะคือสัญญาอนุญาตครีเอทีฟคอมมอนส์ ซึ่งเขียนย่อว่า CC โดยมีเงื่อนไขย่อยให้เลือกคือ BY คือให้เครดิตว่าเจ้าของผลงาน SA คือหากมีการแก้ไขปรับปรุงต้องเผยแพร่งานที่แก้ไขในสัญญาอนุญาตแบบเดียวกัน NC คือห้ามใช้เพื่อการค้า ND คือห้ามดัดแปลง ซึ่งจะเป็นได้ว่าเงื่อนไข NC และ ND ขัดกับหลักการซอฟต์แวร์เสรี โครงการที่ใช้สัญญาอนุญาตครีเอทีฟคอมมอนส์ เช่น วิกิพีเดีย
สัญญาอนุญาตครีเอทีฟคอมมอนส์ถึงแม้จะมีบางบางที่เข้ากันไม่ได้กับซอฟต์แวร์เสรี แต่ก็เป็นการให้สิทธิเป็นการทั่วไปกับสาธารณะกล่าวคือทุกคนได้รับสิทธิ ต่างจากข้อตกลงที่ของบริการโซเชียลมีเดียหลายรายที่ผู้ใช้จะใช้งานได้ก็ต่อเมื่อยอมรับข้อตกลงที่ให้สิทธิแพลตฟอร์มนำผลงานไปใช้ ประมวลผล แก้ไขดัดแปลง เผยแพร่ หรือแม้แต่อนุญาตคนอื่นต่อ
เนื่องจากมิตรสหายหลายท่านกำลังคุยกันเรื่องกฎหมายปัญญาประดิษฐ์ ในส่วนที่เกี่ยวกับงานอันมีลิขสิทธิ์และ generative AI (อันเนื่องมาจากความนิยมของแอปแต่งภาพ Loopsie) และพูดถึงพัฒนาการของร่าง AI Act ในสหภาพยุโรป จึงขออนุญาตเล่ากระบวนการที่มาที่ไปโดยรวมของการออกฎหมายในสหภาพยุโรป รวมถึงที่มาของอำนาจในแต่ละสถาบันของสหภาพยุโรป เพื่อจะได้เข้าใจว่าร่างกฎหมายดังกล่าว ซึ่งปัจจุบันมีอยู่ 3 ร่าง* มีที่มาที่ไปอย่างไร (ซึ่งอาจเกี่ยวข้องกับจุดยืนของแต่ละร่าง)
*ร่างริเริ่มข้อเสนอจาก European Commission (21 เม.ย. 2021), ร่าง provisional position จาก Council of the European Union (6 ธ.ค. 2022), และร่าง position ของ European Parliament (14 มิ.ย. 2023)
Council of the European Union เป็นตัวแทนของรัฐบาลแต่ละประเทศสมาชิก (ซึ่งมักจะเป็นรัฐมนตรีประจำกระทรวงที่เกี่ยวข้องกับเรื่องที่กำลังพิจารณา จึงเป็นที่มาของอีกชื่อ ว่า Council of Ministers)
European Council ประกอบด้วยประมุขของแต่ละประเทศสมาชิก
—
European Council*,** ไม่มีอำนาจโดยตรงในการออกกฎหมาย แต่มีบทบาทสำคัญในการวางแนวนโยบาย ซึ่งก็มีอิทธิพลต่อการตัดสินใจของสถาบันอื่นๆ
(* เนื่องจาก European Council ไม่มีอำนาจในการออกกฎหมาย ถ้าเจอคำว่า Council ในบริบทการพิจารณากฎหมาย มีแนวโน้มจะเป็น Council of the European Union มากกว่า
** มีอีกหน่วยงานที่ชื่อคล้ายกันคือ Council of Europe ซึ่งไม่ได้เป็นสถาบันในสหภาพยุโรป)
(*กระบวนการปรึกษาหารือแบบสามฝ่ายนี้ไม่มีกำหนดอยู่ในสนธิสัญญาของสหภาพยุโรป แต่เป็นวิธีการทำงานอย่างไม่เป็นทางการที่เกิดสำหรับกฎหมายบางฉบับที่มีความซับซ้อนหรือมีความครอบคลุมกว้างขวาง ที่จำเป็นต้องเอา position มาพูดคุยกันนอกรอบบ่อยครั้งกว่าปกติเพื่อให้ทำงานได้เร็วขึ้น แต่เนื่องจากกระบวนการปรึกษาหารือนอกรอบนี้อาจทำกันโดยไม่จำเป็นต้องเปิดเผยต่อสาธารณะ [เนื่องจากเป็นกระบวนการแบบไม่เป็นทางการ] จึงถูกวิจารณ์ว่าเสี่ยงต่อความไม่โปร่งใส ต่อมาทางรัฐสภายุโรปจึงได้ออก Parliament Rules of Procedure เพื่อทำให้กระบวนการดังกล่าวนั้นยังโปร่งใสและสะท้อนความเป็นตัวแทนของพลเมืองยุโรป)
จะเห็นว่ากระบวนการพิจารณากฎหมายมาใช้จะต้องทำร่วมกันระหว่าง Parliament และ Council ในฐานะตัวแทนของสหภาพและของประเทศสมาชิก – ถ้าไปดูชื่อกฎหมายต่างๆ อย่างเป็นทางการ จะเห็นว่าใช้ชื่อว่า (Regulation/Directive) “of the European Parliament and of the Council” คือต้องมาคู่กัน
ส่วนถ้าเป็นประเภท recommendation ประเทศสมาชิกจะเลือกทำหรือไม่ทำตามก็ได้ recommendation จำนวนมากจะมาจาก Commission แต่สถาบันอื่นอย่าง Parliament, Council และ European Central Bank ก็ออก recommendation ได้เช่นกัน
และสุดท้ายคือประเภท opinion ก็ตามชื่อ คือไม่เชิงเป็นการแนะนำว่าควรทำหรือไม่ควรทำอะไร ด้วยขั้นตอนอย่างไร (แบบ recommendation) แต่เป็นการออกความเห็นเกี่ยวกับเรื่องหนึ่งๆ และเนื่องจากจริงๆ มันไม่ได้มีลักษณะเป็นกฎหมาย สถาบันหลักทั้ง Commission, Council, Parliament ออก opinion ได้เองทั้งหมด (ไม่ต้องมีสถาบันอื่นร่วม) เช่นเดียวกับ Committee of the Regions และ European Economic and Social Committee ก็ออก opinion ได้เช่นกัน
ร่างนี้เป็น “provisional position” (ชั่วคราว/เฉพาะกาล) จาก Council of the European Union ก่อนที่กระบวนการ first reading จริงๆ ของ Council จะเกิดขึ้น — ร่างนี้มีชื่อเล่นว่า “General approach”
คำว่า generative AI นั้นปรากฏอยู่ในร่างของ Council 2 แห่ง คือ Recital 6 และ Article 3 (1)
ปัจจุบันร่าง AI Act ของสหภาพยุโรปเพิ่งจะผ่านขั้น first reading ของ Parliament ไป
หลังจากนี้จะเป็น first reading ของ Council ซึ่งถ้า Council รับทั้งร่างโดยไม่มีการแก้ไขเลย กฎหมายก็จะออกมาได้ทันที
จะเห็นว่า แม้จะยังไม่ถึงขั้นตอน first reading อย่างเป็นทางการของ Council แต่ทาง Council ก็มี provisional position ออกมาแล้ว ซึ่งก็มีอิทธิพลต่อการทำ position ของ Parliament (ทำให้ position ของ Parliament ลู่เข้าหาจุดยืนของทาง Council มากขึ้น) และเป็นประโยชน์ต่อการพูดคุยสามฝ่ายอย่างไม่เป็นทางการ ซึ่งอาจทำให้กระบวนการต่างๆ ไม่จำเป็นจะต้องไปถึงขั้น Conciliation
My programs have defects because some functions destroy or mutate shared data. I avoid mutating shared data in Common Lisp; for example, I use CONS. However, I made a mistake by using SORT, which wasn't aware that SORT is destructive. Sometimes I still forget that SORT mutates its input data. The function sort-snode below sorts a part of a tree.
Since it is based on SORT, it changes the input data, which is snode. After I found the output tree didn't look what I expected, I took days to see where my mistake was.
My workaround is running COPY-LIST before SORT, as shown below.
Submitted to the AI Incident Database on 25 October 2022 (my first time!). Based on a report by Thai PBS on 4 October 2019, with information from additional sources. Appeared in the database on 26 October 2022. I will keep the extended report here for archival purpose. For a concise report and citation, please link to Incident 375 in the AI Incident Database.
—
Lots of Thais cannot register for the government cash handout scheme as the app managing government wallet failed to recognize their faces during the authentication process. People entitled to the handout have to wait for a very long queue at their local ATMs instead to get authenticated.
The handout is limited to 10 million recipients in the first round and a recipient has to register to claim it. Thailand has almost 70 million population. The registration involved the authentication process of photo taking the citizen identification card and the face of the card holder. If not successful, the citizen can do it at a supported ATM machine. There are about 3,000 ATMs that support the process nationwide.
On social media, internet users share the problems they had, screenshots of messages from the app, and also photo taking techniques that may pleased the facial recognition. The tips include applying face powder makeup, put the hair up, take off eyeglasses, take the photo during daytime in the sunlight, look straight, make the face and comb the hair to match one in the ID card, and avoid having shadow on the ID card.
Elder people are one of the groups that suffer the most from the facial recognition issue, as their current faces can be more different from ones in their ID cards. Thai citizen ID card law said people who are 70 years old or more are no longer need to renew their cards, which normally requires to be renewed every eight years. Because of this, the face on the ID card and the actual face can be very different.
Background
The cash handout program, called “Chim, Shop, Chai” (ชิมช้อปใช้ roughly translated as “eat, buy, spend”) is aimed to promote domestic tourism.
The same government wallet, “G Wallet”, will be used later for many other rounds of cash handout and co-pay programs to come during the COVID-19 pandemic, such as “Khon La Khrueng” (คนละครึ่ง, “each pay half”) – where the same facial recognition issue still occurs.
Total number of individuals registered for these financial support programs is around 26.5 million. The wallet itself is inside “Paotang” (เป๋าตัง) super app, developed and managed by a state enterprise Krungthai Bank. Paotang has 34 million active users in June 2022.
Suriyawongkul, Arthit. (2019-09-29) Incident Number 375. in Lam, K. (ed.) Artificial Intelligence Incident Database. Responsible AI Collaborative. Retrieved on October 26, 2022 from incidentdatabase.ai/cite/375.
AI for All ชุดโครงการปัญญาประดิษฐ์/วิทยาการหุ่นยนต์สำหรับทุกคน ได้รับการสนับสนุนจากสำนักงานสภานโยบายการอุดมศึกษา วิทยาศาสตร์ วิจัยและนวัตกรรมแห่งชาติ (สอวช.) มีส่วนของบทความและสรุปข่าวประจำเดือน ซึ่งเนื้อหาบางส่วนเกี่ยวข้องกับประเด็นข้อกังวลและกรอบกติกาเมื่อนำระบบปัญญาประดิษฐ์มาใช้ในสังคม
Many teams don't test their functions separately. They run the whole project to see the result. When something goes wrong, they check the log and use a debugger. They are the majority, at least from my experience. Static type checking and type annotations are efficient for these teams because type annotations give a rough idea about data for each function. They can't look at testing data, which doesn't exist.
Still, I wonder if forcing type annotation is practical. Reading a long function is difficult. By splitting a long function, I found that type annotation can be distracting because instead of focusing on logic, I have to think about type annotation; sometimes, the size of type annotation is about half the function's size. Maybe forcing type annotation only on functions, which an outsider from another module can call, like in OCaml, is practical. I haven't coded in OCaml beyond some toy programs. So I don't know if it is really practical as I imagine.
My computer has many cores but doesn't have enough RAM to store the whole data. So I usually need to process data from a stream parallelly, for example, reading a Thai text line by line and dispatch them to processors running on each core.
In Clojure, I can use pmap because pmap works on a lazy sequence. However, using Lparallel's pmap on Common Lisp with lazy sequences wasn't in the example. So I used Bordeaux threads and ChanL channels instead. It worked. Still, I had to write repetitive code to handle threads and channels whenever I wanted to process the stream parallelly. It didn't only look messy, but it came with many bugs.
So I created a small library, called stream-par-procs, to wrap threads and channels management. As shown in the diagram, the reader reads a line for the stream, the system dispatch a line to different processors, and finally, the collector creates the final result.
I only need to define a function to be run in the processor, another function for running in the collector, and other functions for initializing states. It hides details, for example, joining collector thread when every processor sent END-OF-STREAM.
In brief, stream-par-procs makes processing a stream parallelly in Common Lisp more convenient with hopefully fewer bugs by reusing threads and channels management code.
Fetching data from REST APIs or a plain HTTP server is usually a part of my work. Many problems may arise during fetching data; for example, a server goes offline, my storage is out of space, an API gives the wrong URL, a server sends data in an invalid format, etc. To illustrate the situation, I can oversimplify my task to the function below:
deffetch(uri):requests.get(uri)
A fetcher usually faces one of these problems. I don't want to rerun a fetcher after running it for an hour or a day. Maybe I can write my fetcher to resume working after it stops at any state, but it will take time, and my fetcher will be more complex. If I code in Python, the fetch function can raise many exceptions, and my fetcher will stop because my code didn't handle any exceptions.
I see two types of environments that will help me fix the problem.
Type checker may help me handle as many types of exceptions as possible upfront. Java definitely can do this because, in Java, I have to declare the list of exceptions as a part of the fetch function, and Javac checks if the caller handles these exceptions. Rust doesn't use a Java-like exception, but the result type encodes the types of errors, so the Rust compiler can check if a caller handles these errors.
Common Lisp runtime will run a debugger if unhandled errors occur. So I can choose what to do next. For example, I can empty my trash folder if my storage is out of space. I still don't know if I can handle the storage space problem later with Erlang runtime.
Python doesn't seem to belong to (1) nor (2). I tried mypy, a static type checker for Python, but it didn't warn me anything. And I don't know how to retry or do something else with Python's exception system rather than let it stop.
I ran the average-1000-etipitaka-flexi-streams on SBCL 2.2.5-1.1-suse on my laptop with Celeron N4500. It took 1.591 seconds.
Then I change the file to my-data.ndjson.zst, whose average line length is 515.5 bytes. Running average-1000-ndjson-flexi-streams took 4.411 seconds.
So I also tested with my customized utf8-input-stream. Running average-1000-etipitaka-utf8-input-stream, and average-1000-ndjson-utf8-input-stream took 0.019 seconds, and 0.043 seconds respectively, which means utf8-input-stream is 83X faster for short lines, and 102X faster for long lines, than Flexi-streams in these tests.
I have a big line-separated JSON file, compressed in Zstd format. I will call this file data.ndjson.zst.
I want my program to read the file line by line. I can use https://github.com/glv2/cl-zstd to make a binary stream from the file. Still, I can run the function read-line of a binary stream. So I need to wrap the binary stream with Flexi-stream.
Printing those lines is a realistic example. So you can replace (print line) with your practical applications, for example, parse a JSON line and extract a book title.
I want to use Emacs 28.1 on aging OS that I mustn't modify /usr. I tried Docker but I want to use local command too. SBCL didn't work properly on old Docker. So I install Emacs from a source tarball to my home directory.
However, Emacs failed to verify TLS certs. So I installed GNUTLS, Nettle, Idn, and Unistring.
#!/bin/bashexport PKG_CONFIG_PATH=$HOME/lib64/pkgconfig:$HOME/lib/pkgconfig
export LD_RUN_PATH=$HOME/lib:$HOME/lib64
export LD_LIBRARY_PATH=$HOME/lib:$HOME/lib64
export LDFLAGS="-L$HOME/lib -L$HOME/lib64"rm-rf libunistring-1.0 libidn2-2.3.2 nettle-3.6 gmp-6.2.1 emacs-28.1
curl https://ftp.gnu.org/gnu/libunistring/libunistring-1.0.tar.gz | tar-xzvf - &&\pushd libunistring-1.0 &&\
./configure --prefix=$HOME&&\
make -j`nproc`&&\
make install&&\popd||exit 1
curl https://ftp.gnu.org/gnu/libidn/libidn2-2.3.2.tar.gz | tar-xzvf - &&\pushd libidn2-2.3.2 &&\
./configure --prefix=$HOME&&\
make -j`nproc`&&\
make install&&\popd||exit 1
curl https://ftp.gnu.org/gnu/nettle/nettle-3.6.tar.gz | tar xzvf - &&\pushd nettle-3.6 &&\
./configure --prefix=$HOME--enable-mini-gmp&&\
make -j`nproc`&&\
make install&&\popd||exit 1
curl https://www.gnupg.org/ftp/gcrypt/gnutls/v3.7/gnutls-3.7.6.tar.xz | tar xJvf - &&\pushd gnutls-3.7.6 &&\
./configure --prefix=$HOME&&\
make -j`nproc`&&\
make install&&\cd .. ||exit 1
curl http://ftp.gnu.org/pub/gnu/emacs/emacs-28.1.tar.gz | tar-xzvf - &&pushd emacs-28.1 &&\
./configure --prefix=$HOME--with-x-toolkit=no --with-xpm=no --with-jpeg=no --with-jpeg=no --with-gif=no --with-tiff=no --with-png=no &&\
make -j`nproc`&&\
make install&&\cd .. ||exit 1
echo
echo
echo"Emacs 28.1 must be ready!"echo
The worst thing I did in Rust was using unbounded asynchronous channels. It took days to figure out that my program had backpressure, and channels ate up all RAM.
So, since then, I have always used sync_channel or bounded channels from Crossbeam.
I was in a world where a code describing type signature was longer than a code telling what a program does. I don't want to bring that situation back.
I expect AI code completion to replace static typing. Ruby Type Signature (RBS) and similar things must be inferred automatically from a database or other data sources.
; caught WARNING:
; Derived type of STATIC1::A is
; (VALUES FIXNUM &OPTIONAL),
; conflicting with its asserted type
; STRING.
; See also:
; The SBCL Manual, Node "Handling of Types"
So SBCL - a Common Lisp implementation, can check type in compile-time. Anyway, a programmer needs to read warnings.
Last week, I watched a video about coding and Docker configuration on a TV. I couldn't read any line of code. Then I thought about visually impaired people. How do they code? Every student in Thailand must learn to code. I presume the situation is similar in every country.
People widely use text-to-speech services these days. I cannot find any text-to-speech service for reading source code aloud. Let's assume we have a modified version of a text-to-speech service.
So I looked at source codes in different programming languages on the CodeRosetta website. I perceive Python code blocks by their visual structure purely. To read Python source code, I have to encode its visual structure, namely indents to sounds. Reading a nested code block won't be easy to understand. For example, reading twelve leading white spaces aloud will be very strange. In Lisp, reading open parenthesis and close parenthesis is more straightforward, but I will forget which parenthesis. So the best form of code blocks is in QuickBasic, which has different keywords between different kinds of blocks. For example, FOR with NEXT, and IF with END IF. Later I got a comment from Lemmy.ml, which told me that Ada also has different keywords between different kinds of blocks. Another idea from Lemmy.ml is the reader must convert the Python code block into a similar form as Ada or QuickBasic before reading.
MBasic refers to code by line numbers instead of code blocks. However, by listening to five lines, I forgot the line number. For example, when I heard gosub 70, I forgot what was at line 70.
In X86 Assembly, a programmer labels only the line that the program will jump to it. So X86 Assembly code looks much better than MBasic.
Still, coding in X86 Assembly can be exhaustive in many cases. For example, X86 Assembly doesn’t support recursion. Writing quick sort in X86 Assembly can be too difficult for learning to code.
Haskell doesn’t rely on code blocks. However, reading the symbols, for example, >>= is challenging. Prolog’s symbols are easy to read. For example, we can read :- as IF. Anyway, the Prolog programming paradigm is different from the mainstream one now. So Erlang, whose syntax is similar to Prolog, is a more practical alternative.
In brief, Erlang is a practical, less visual-centric program language. Because it mostly relies on names instead of code blocks, reading names aloud is much easier than reading code blocks aloud. Furthermore, the Erlang programming paradigm is more mainstream now.