Non-ASCII characters in filenames/Unicode

John Collins jcc8 at psu.edu
Thu Mar 10 00:01:08 CET 2022


Hi Norbert,

Many thanks for the information.

On 3/8/22 9:24 PM, Norbert Preining wrote:
> 
> Huge pain.

That's for sure.  I'm on my 3rd approach for latexmk, which actually seems to 
work and needs very little surgery on the code!  It's nice to find some fellow 
sufferers.


> Since neither install-tl nor tlmgr creates or handles files with
> non-ascii file names, we happily ignore that ;-)

I was also interested in what's done in all the *tex programs, even though that 
is probably irrelevant to Perl scripts with current versions of Perl.  As you 
know, the *tex programs handle full Unicode for filename on the command line 
without problems, even when invoked from cmd.exe with non-UTF-8 code pages. The 
same behavior seems to be a lost cause for Perl scripts.


> binmode (STDIN, ':encoding(console_in)');
> binmode (STDOUT, ':encoding(console_out)');
> binmode (STDERR, ':encoding(console_out)');
> ```
> 
> As far as I remember, the reason for that was the former GUI which was
> written in PerlTK and had translated strings which were also output to
> the console/terminal. With the above the encoding worked also for
> non-utf8 based consoles (like Windows).

So internally you are using decoded strings (to use Perl terminology).  To me, 
that seemed the obvious and recommended way of doing things.  But it got 
complicated.  (I'll omit the reasons. ... :-()

My current approach is to use encoded strings in the system coding system, and 
to do the necessary translation of the contents of .fls, .log, .aux files.  I 
use the Win32 module to get the windows code pages.  Other OSs can safely use 
utf-8 AFAIK.

My other trick is to set the console CP on Windows to the system CP (and 
restore it at the end), so that filenames are displayed correctly.

I'm not at all sure my approach is the best.  Comments appreciated.

Best,
John


More information about the tex-live mailing list.