Non-ASCII characters in filenames/Unicode
jcc8 at psu.edu
Thu Mar 10 00:01:08 CET 2022
Many thanks for the information.
On 3/8/22 9:24 PM, Norbert Preining wrote:
> Huge pain.
That's for sure. I'm on my 3rd approach for latexmk, which actually seems to
work and needs very little surgery on the code! It's nice to find some fellow
> Since neither install-tl nor tlmgr creates or handles files with
> non-ascii file names, we happily ignore that ;-)
I was also interested in what's done in all the *tex programs, even though that
is probably irrelevant to Perl scripts with current versions of Perl. As you
know, the *tex programs handle full Unicode for filename on the command line
without problems, even when invoked from cmd.exe with non-UTF-8 code pages. The
same behavior seems to be a lost cause for Perl scripts.
> binmode (STDIN, ':encoding(console_in)');
> binmode (STDOUT, ':encoding(console_out)');
> binmode (STDERR, ':encoding(console_out)');
> As far as I remember, the reason for that was the former GUI which was
> written in PerlTK and had translated strings which were also output to
> the console/terminal. With the above the encoding worked also for
> non-utf8 based consoles (like Windows).
So internally you are using decoded strings (to use Perl terminology). To me,
that seemed the obvious and recommended way of doing things. But it got
complicated. (I'll omit the reasons. ... :-()
My current approach is to use encoded strings in the system coding system, and
to do the necessary translation of the contents of .fls, .log, .aux files. I
use the Win32 module to get the windows code pages. Other OSs can safely use
My other trick is to set the console CP on Windows to the system CP (and
restore it at the end), so that filenames are displayed correctly.
I'm not at all sure my approach is the best. Comments appreciated.
More information about the tex-live