memory errors in libdatrie when building trie dictionary

I noticed strange crashes while trying to build libthai on Windows. These crashes were a result of attempting to build the trie dictionary. Switching back to Linux, I ran trietool through valgrind:

$ valgrind ../../libdatrie/bin/trietool thbrk add-list tdict.txt

Invalid read of size 1
   at 0x33F7E6FFB0: strcpy (in /lib64/tls/libc-2.3.4.so)
   by 0x4A0BCF0: tail_set_suffix (tail.c:189)
   by 0x4A0C68B: trie_branch_in_tail (trie.c:229)
   by 0x4A0C4BD: trie_store (trie.c:172)
   by 0x4A0D286: sb_trie_store (sb-trie.c:153)
   by 0x401248: command_add_list (trietool.c:182)
   by 0x400E27: decode_command (trietool.c:107)
   by 0x400B4C: main (trietool.c:58)
 Address 0x4B1F209 is 0 bytes after a block of size 1 alloc'd
   at 0x490641D: realloc (vg_replace_malloc.c:306)
   by 0x4A0BCC9: tail_set_suffix (tail.c:188)
   by 0x4A0C68B: trie_branch_in_tail (trie.c:229)
   by 0x4A0C4BD: trie_store (trie.c:172)
   by 0x4A0D286: sb_trie_store (sb-trie.c:153)
   by 0x401248: command_add_list (trietool.c:182)
   by 0x400E27: decode_command (trietool.c:107)
   by 0x400B4C: main (trietool.c:58)

Source and destination overlap in strcpy(0x4B1F948, 0x4B1F948)
   at 0x4906DC3: strcpy (mc_replace_strmem.c:106)
   by 0x4A0BCF0: tail_set_suffix (tail.c:189)
   by 0x4A0C68B: trie_branch_in_tail (trie.c:229)
   by 0x4A0C4BD: trie_store (trie.c:172)
   by 0x4A0D286: sb_trie_store (sb-trie.c:153)
   by 0x401248: command_add_list (trietool.c:182)
   by 0x400E27: decode_command (trietool.c:107)
   by 0x400B4C: main (trietool.c:58)

Invalid read of size 1
   at 0x4906D28: strcpy (mc_replace_strmem.c:272)
   by 0x4A0BCF0: tail_set_suffix (tail.c:189)
   by 0x4A0C68B: trie_branch_in_tail (trie.c:229)
   by 0x4A0C4BD: trie_store (trie.c:172)
   by 0x4A0D286: sb_trie_store (sb-trie.c:153)
   by 0x401248: command_add_list (trietool.c:182)
   by 0x400E27: decode_command (trietool.c:107)
   by 0x400B4C: main (trietool.c:58)
 Address 0x4B1FB72 is 0 bytes after a block of size 2 alloc'd
   at 0x490641D: realloc (vg_replace_malloc.c:306)
   by 0x4A0BCC9: tail_set_suffix (tail.c:188)
   by 0x4A0C68B: trie_branch_in_tail (trie.c:229)
   by 0x4A0C4BD: trie_store (trie.c:172)
   by 0x4A0D286: sb_trie_store (sb-trie.c:153)
   by 0x401248: command_add_list (trietool.c:182)
   by 0x400E27: decode_command (trietool.c:107)
   by 0x400B4C: main (trietool.c:58)

ERROR SUMMARY: 10254 errors from 3 contexts (suppressed: 9 from 4)

I am working on a fix, but I do not understand the code well enough to do so yet. Any help would be appreciated.

Re: memory errors in libdatrie when building trie dictionary

Below you will find a patch to fix the memory errors. I feel that this may not be as efficient of a fix as is possible, but it clears up my problems.

diff -r --unified play/libdatrie-0.1.2/datrie/tail.c libdatrie-0.1.2/datrie/tail.c
--- play/libdatrie-0.1.2/datrie/tail.c  2006-10-11 08:31:51.000000000 -0400
+++ libdatrie-0.1.2/datrie/tail.c       2008-01-08 17:49:12.593750000 -0500
@@ -185,10 +185,11 @@
 {
     index -= TAIL_START_BLOCKNO;
     if (index < t->num_tails) {
-        t->tails[index].suffix
-            = (TrieChar *) realloc (t->tails[index].suffix,
-                                    strlen ((const char *)suffix) + 1);
-        strcpy ((char *) t->tails[index].suffix, (const char *) suffix);
+        char *tmp = NULL;
+        if (suffix) tmp = strdup(suffix);
+        if (t->tails[index].suffix) free(t->tails[index].suffix);
+        t->tails[index].suffix = tmp;
+
         t->is_dirty = TRUE;
         return TRUE;
     }

Re: memory errors in libdatrie when building trie dictionary

A slightly more efficient fix is to dup the old_suffix's substring in trie_branch_in_tail() before calling tail_set_suffix(), which is the only case that causes the overlap, so the suffix is not always duplicated in all cases. But to make less assumption on the API (that the strings must not be overlapped), I prefer your fix.

Committed, with minor stlyish changes. Thanks a lot for your report and patch.

Index: ChangeLog
===================================================================
RCS file: /home/cvs/software/datrie/ChangeLog,v
retrieving revision 1.54
diff -u -p -r1.54 ChangeLog
--- ChangeLog	10 Jan 2008 03:41:15 -0000	1.54
+++ ChangeLog	10 Jan 2008 04:51:53 -0000
@@ -1,5 +1,11 @@
 2008-01-10  Theppitak Karoonboonyanan <...>
 
+	* datrie/tail.c (tail_set_suffix): Fix bug for the case in which
+	suffix argument and tail's suffix overlap. Bug report and patch by
+	shepmaster in http://linux.thai.net/node/102.
+
+2008-01-10  Theppitak Karoonboonyanan  <...>
+
 	* datrie/sb-trie.c (sb_trie_root): Return NULL pointer, rather than
 	FALSE. Bug reported by shepmaster in http://linux.thai.net/node/101.
 
Index: datrie/tail.c
===================================================================
RCS file: /home/cvs/software/datrie/datrie/tail.c,v
retrieving revision 1.7
diff -u -p -r1.7 tail.c
--- datrie/tail.c	11 Oct 2006 12:31:51 -0000	1.7
+++ datrie/tail.c	10 Jan 2008 04:51:53 -0000
@@ -185,10 +185,16 @@ tail_set_suffix (Tail *t, TrieIndex inde
 {
     index -= TAIL_START_BLOCKNO;
     if (index < t->num_tails) {
-        t->tails[index].suffix
-            = (TrieChar *) realloc (t->tails[index].suffix,
-                                    strlen ((const char *)suffix) + 1);
-        strcpy ((char *) t->tails[index].suffix, (const char *) suffix);
+        /* suffix and t->tails[index].suffix may overlap;
+         * so, dup it before it's overwritten
+         */
+        TrieChar *tmp = NULL;
+        if (suffix)
+            tmp = strdup (suffix);
+        if (t->tails[index].suffix)
+            free (t->tails[index].suffix);
+        t->tails[index].suffix = tmp;
+
         t->is_dirty = TRUE;
         return TRUE;
     }