[tex-live] Fwd: Duplicate Thai patterns reported by XeTeX in TL 2015 pretest

Jonathan Kew jfkthame at gmail.com
Wed Apr 15 10:17:51 CEST 2015


On 15/4/15 02:01, Dohyun Kim wrote:

> This issue might be related to the weird behaviour of xetex I have found:
>
> \toks0{유
>> }\showthe\toks0
> \bye
>
> $ xetex test.tex
> This is XeTeX, Version 3.14159265-2.6-0.99992 (TeX Live 2015/dev)
> (preloaded format=xetex)
>   restricted \write18 enabled.
> entering extended mode
> (./test.tex
>>   \par .
> l.5 }\showthe\toks0
>
> ?
>   )
> No pages of output.
> Transcript written on test.log.
>
> As shown, characters `유' and `술' are gone away and nothing is printed.
> Note that Unicode codepoints of these characters are U+C720 and U+C220
> respectively, with 0x20 in their lower bytes.
>

Note also that in the original report about Thai patterns, the error 
messages show lines from the patterns file with their last character 
U+0E20 (ภ) missing. Again, 0x20 in the lower byte (which will be the 
leading byte in the buffer on little-endian platforms).

It looks to me like there could be something broken fairly early in the 
input-scanning process (but after conversion from UTF-8 to UTF-16) 
whereby a line-final UTF-16 character that begins with 0x20 is being 
discarded as though it were a <space>. But this apparently doesn't 
affect the Win32 binary, as the testcases work fine for Akira. (Is it 
ONLY on OS X, or has this been observed on other platforms?)

Unfortunately, I don't have a current development build handy for 
debugging purposes. Anyone....?

JK



More information about the tex-live mailing list