[tex-live] Problems with non-7bit characters in filename

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Jul 7 08:19:50 CEST 2014


On Mon, Jul 7, 2014 at 6:24 AM, Norbert Preining wrote:
> On Mon, 07 Jul 2014, Reinhard Kotucha wrote:
>> From your mail I deduce that you aren't happy with the answers you
>> got.  But UTF-8 works fine for me, Zdeněk, and maybe everybody else
>> except you.  Can you tell us *what* doesn't work for you?
>
> Reinhard, please, give him a break. There seems to be a bug
> wrt to latin1 handling when it comes to command line and file
> treatment.

I couldn't agree more.

> I agree that I am happy with UTF8, but we *should* support latin1
> (or, for that matter, latin9, or maybe also latinN for N in ...)
> as far as possible

I just wanted to say that supporting 8-bit encodings probably cannot
be implemented properly. It can be implemented in such a way that
running weird filenames will work, but only to a certain very limited
extent. I was actually surprised to learn that whitespaces in
filenames work at all with pdfTeX. (The don't work with ConTeXt MkII
for example.) Spaces and quotation marks in path of TeX Live
installation also cause a whole bunch of problems all over the place
(with everyone asking for mental health of the person trying to
install TeX Live into "/usr/local/Schroedinger's TeX" for example).
And we didn't even leave the ASCII at this point, let alone start
talking about 8-bit.

To start with imagine how to implement what my gnuplot module does for
example. Let's say that the user starts with

% äöü.tex
\usemodule[gnuplot]
\starttext
\startGNUPLOTscript[example]
plot 'dätä.dat' with lines, cos(x) t "cös(x)"
\stopGNUPLOTscript
\useGNUPLOTgraphic[example]
\stoptext

The modules writes out a file \jobname-gnuplot-1.tmp
(äöü-gnuplot-1.tmp) containing verbatim:
    plot 'dätä.dat' with lines, cos(x) t "cös(x)"
(In what encoding?)

(a) LuaTeX needs to make sure that "\jobname-gnuplot-1.tmp" uses
Latin1, else it will open some garbage filename.

(b) The file "äöü.tex" was a valid UTF-8 file. The contents of
äöü-gnuplot-1.tmp are written out verbatim. How exactly is LuaTeX or
my modules supposed to know that this should be converted from UTF-8
to Latin1 to get the 'dätä.dat' loadable by gnuplot later? And how to
do that conversion? And what should it do with "cös(x)"? But for the
sake of argument, let's assume that there's a magic function that does
that.

(c) A second file is generated, "\jobname-gnuplot-1.plt"
(äöü-gnuplot-1.plt) containing
    load 'äöü-gnuplot-1.tmp'
LuaTeX needs to make sure that this ends up as Latin1. It's not clear
to me how to achieve that reliably, but let's say that \jobname does
it's magic and "knows that it should use Latin1 when printed out". (A
bonus question: what if \jobname is used inside a valid UTF-8
document, but the filename itself being Latin1? What should \jobname
inside the document do? Raise invalid sequence? Do the conversion to
UTF-8 automatically? When should it be converted to UTF-8 and when
should it stay in Latin1?)

(d) Now LuaTeX calls "gnuplot \jobname-gnuplot-1.plt". Gnuplot
generates a TeX file with label 'dätä.dat' in the legend – in Latin1
encoding of course. LuaTeX needs to process that file and typeset the
filename, but it would complain because of invalid bytes as LuaTeX is
expecting input in UTF-8, but getting some Latin1 characters. I'm not
sure in what encoding "cös(x)" would end up, but it would break, no
matter what.

(e) Of course then one also has to interact with MetaPost and pass the
contents between TeX and MetaPost. And between TeX and Lua.

This is not to say that LuaTeX shouldn't wok with weird filenames.
Just pointing it out that this will never work properly when trying to
use the filenames for anything else but the very very basic stuff.

Mojca




More information about the tex-live mailing list