Non-ASCII characters in filenames/Unicode

John Collins jcc8 at psu.edu
Thu Mar 10 00:24:47 CET 2022


On 3/8/22 12:30 AM, Akira Kakuto wrote:
> 
> Mixed encoding is a bug.
> In the new Windows binaries, PWD is also encoded in UTF-8 in
> luatex, pdftex, xetex, uptex, and euptex.

Great.  I'll leave my work-around in latexmk, since not everyone updates 
TeXLive to the latest version.  (The work around depends on testing whether or 
not the PWD line is valid UTF-8, so it will behave properly with the new binaries.)


There's one other UTF-8 anomaly I noticed (and have work-around code for it in 
latexmk). This is that line wrapping in a .log file by pdflatex and lualatex 
doesn't respect the character semantics.  They wrap at a particular number of 
bytes, which in a default installation is 79.  You only definitely get utf-8 
after undoing the line wrapping.  In contrast, xelatex wraps at 79 code point 
units.

Dealing with line wrapping is important to latexmk, so that it can extract 
dependency information properly.  Once I realized the different behavior of 
xelatex, the true source of a user-reported bug in latexmk became clear.

Probably the best solution for latexmk is to turn off line wrapping by the 
programs it invokes. But that may make things not so nice for apps that display 
log files to the user.  (TeXShop, TeXWorks, etc.)


John Collins




More information about the tex-live mailing list.