[tex-live] packages with characters > 127

Robin Fairbairns Robin.Fairbairns at cl.cam.ac.uk
Thu Dec 31 13:02:37 CET 2009

[nb: i've removed the ctan list from the cc: -- it's quite enough for
just one of us (me) to keep a watching brief, i think.  please remove it
if you reply to some other sub branch of the thread.]

Norbert Preining <preining at logic.at> wrote:

> On Do, 31 Dez 2009, Taco Hoekwater wrote:
> > another encoding than the author's document), so in my opinion it be
> > best if package authors would stay away from 8-bit encodings if they
> > want to be optimally portable to luatex. Just my 2c.
> Umpf, I am not overly content with that.

me neither.

> I mean yes, in an ideal world all would be iso-2022 or UTF32 or whatever
> universal encoding you select, but it isn't. And we have hundreds of
> packages and files here, and billions out in the world (docuemnts
> of users) with legacy encoding. Not being able to read them for
> the most advanced engine is a bit strange.
> Be forgiving with everything you get, but restrictive and strict 
> with what you give yourself.

how does one "forgive" a non-standard 8-bit character when you thought
you were reading utf-8?  ignore it? (how does the user find why their
characters didn't appear?)  produce an error? (users will complain)
produce a warning? (tex is already verbose enough, so users will

the problem is, that unless the processor is told what encoding the
document uses (of the huge numbers available, standardised,
microsoft-used or largely private), it can't in general determine what
the semantic of any character is.

> I cannot propose a solution, and if there is no way around that, so 
> it be. What we should try our best.

i fear there is no truly general solution.  perhaps the utf-8 gobbling
engines should provide a switch, between the uses above?

(one problem we face is that we can't guarantee that a document and its
packages are in the same encoding, so a switch saying, for example,
--language=iso-8859-6 doesn't completely solve the problem, and could be
vastly distracting if a package was in iso 8859-8.)


More information about the tex-live mailing list