[tex-live] Critical bugfix needed for XeTeX [was: Re: Fwd: Duplicate Thai patterns reported by XeTeX in TL 2015 pretest]

Peter Breitenlohner peb at mppmu.mpg.de
Fri Apr 17 09:58:34 CEST 2015


On Thu, 16 Apr 2015, Jonathan Kew wrote:

> The problem reported with Thai patterns, as well as the Korean problem 
> mentioned by Dohyun Kim (incorrect \showthe output from a token list that 
> should contain a couple of Korean characters), turns out to be a symptom of 
> an input-processing bug.
>
> This is a critical issue inasmuch as it can result in silently discarding 
> certain characters from the user's input, with no indication that typesetting 
> has failed in any way.
>
> Any Unicode character whose low byte is 0x20 or 0x09 could be affected.
>
> The problem arises from the (unsigned char) typecasts added in TL revision 
> 34284:
>
> http://tug.org/svn/texlive/trunk/Build/source/texk/kpathsea/c-ctype.h?r1=34283&r2=34284&
>
> Combined with the "fake" isascii() definition found at:
>
> http://tug.org/svn/texlive/trunk/Build/source/texk/kpathsea/c-ctype.h?annotate=34284#l27
>
> which will override the system-defined isascii() if it is a function rather 
> than a macro, this makes ISBLANK(c) return true for any UTF-16 codepoint with 
> 0x09 or 0x20 in the lower byte, regardless of its upper byte. This means that 
> the code to "trim trailing whitespace" at:
>
> http://tug.org/svn/texlive/trunk/Build/source/texk/web2c/xetexdir/XeTeX_ext.c?annotate=36591#l454
>
> will also "trim" various other non-whitespace characters, such as 
> Latin-script ĉ and Ġ, Cyrillic Љ and Р, Devanagari ठ, Thai ภ and many more.
>
> One workaround would be to replace the use of ISBLANK there with an explicit 
> test for the specific characters 0x09 and 0x20; but in case there are other 
> ISBLANK uses, I think it would be better to fix kpathsea/c-ctype.h.
>
> If we really need to provide the isascii(c) macro here (I don't know what 
> other platforms/programs might break if we removed it), then I propose making 
> it at least somewhat more likely to be correct:

Hi Karl, Jonathan, Khaled,

this should really be fixed.  I have added  a test if iscascii is either
defined as macro or declared as function (or both) and only otherwise
   #define isascii(c) (((c) & ~0x7f) == 0)
(stolen from GNU libc).

Unfortunately this implies modifications in binaries for systems where
isascii is not defined as macro (supposedly Darwin and perhaps others).

Regards
Peter


More information about the tex-live mailing list