[tex-live] Critical bugfix needed for XeTeX [was: Re: Fwd: Duplicate Thai patterns reported by XeTeX in TL 2015 pretest]
peb at mppmu.mpg.de
Fri Apr 17 09:58:34 CEST 2015
On Thu, 16 Apr 2015, Jonathan Kew wrote:
> The problem reported with Thai patterns, as well as the Korean problem
> mentioned by Dohyun Kim (incorrect \showthe output from a token list that
> should contain a couple of Korean characters), turns out to be a symptom of
> an input-processing bug.
> This is a critical issue inasmuch as it can result in silently discarding
> certain characters from the user's input, with no indication that typesetting
> has failed in any way.
> Any Unicode character whose low byte is 0x20 or 0x09 could be affected.
> The problem arises from the (unsigned char) typecasts added in TL revision
> Combined with the "fake" isascii() definition found at:
> which will override the system-defined isascii() if it is a function rather
> than a macro, this makes ISBLANK(c) return true for any UTF-16 codepoint with
> 0x09 or 0x20 in the lower byte, regardless of its upper byte. This means that
> the code to "trim trailing whitespace" at:
> will also "trim" various other non-whitespace characters, such as
> Latin-script ĉ and Ġ, Cyrillic Љ and Р, Devanagari ठ, Thai ภ and many more.
> One workaround would be to replace the use of ISBLANK there with an explicit
> test for the specific characters 0x09 and 0x20; but in case there are other
> ISBLANK uses, I think it would be better to fix kpathsea/c-ctype.h.
> If we really need to provide the isascii(c) macro here (I don't know what
> other platforms/programs might break if we removed it), then I propose making
> it at least somewhat more likely to be correct:
Hi Karl, Jonathan, Khaled,
this should really be fixed. I have added a test if iscascii is either
defined as macro or declared as function (or both) and only otherwise
#define isascii(c) (((c) & ~0x7f) == 0)
(stolen from GNU libc).
Unfortunately this implies modifications in binaries for systems where
isascii is not defined as macro (supposedly Darwin and perhaps others).
More information about the tex-live