[tex-live] TL2018/runscript: Problem with files with non-ASCII path on Windows; TeXworks is affected

ARATA Mizuki minorinoki at gmail.com
Tue May 8 17:25:39 CEST 2018


I'm reporting a problem in the runscript wrapper on Windows.

# Symptoms

On Windows, the TeXworks from TeX Live 2018 can't open the .tex files with non-ASCII characters in its path.  This problem arises when TeXworks is launched by double-clicking the .tex file or dropping the .tex file on bin/win32/texworks.exe, and not when opening the file with 'Open File' dialog on the app.  (So, this problem must have to do with the command-line processing!)

Looking at the error message [1], the path string is suffering from mojibake.  In particular, it looks like as if a multibyte string encoded in CP932 got interpreted as Latin-1.

[1]: Screenshot: https://blog.miz-ar.info/wp-content/uploads/2018/05/screenshot-texworks-error.png

As a note, the TeXworks from TeX Live 2017 doesn't suffer from this problem.  This is strange, because TeXworks hasn't been updated since April 2017 and their hash (C:/texlive/201{7,8}/tlpkg/texworks/texworks.exe on my machine) match.

(As described below, this problem is not limited to TeXworks and affects all non-Lua programs that are called via runscript; e.g. dviout is also affected.)

# The Cause

The problem is in the runscript wrapper.  Actually, dropping the file onto the 'real' texworks.exe (tlpkg/texworks/texworks.exe) works around the problem.

Here is what happens when the 'wrapper' texworks.exe (bin/win32/texworks.exe) is launched:

1. The 'wrapper' texworks.exe runs runscript.tlu with LuaTeX (luatex.dll).
2. The script runscript.tlu launches the 'real' texworks.exe by calling LuaTeX's os.spawn, and os.spawn calls _spawnvp function in the C Runtime Library.
3. The function _spawnvp should call CreateProcessW with its multibyte argument converted to wide string.

So what changed since TL2017?  runscript.tlu?  LuaTeX?  Of course these two have changed since 2017, but there is another component that has changed:  The C Runtime Library that LuaTeX depends on.

In TL2017, luatex.dll depended on MSVCR100.DLL (the 'classic' MSVCRT), but in TL2018 luatex.dll depends on the Universal CRT (api-*.dll and vcruntime140.dll and ucrtbase.dll stuff).

Between these two C Runtime Libraries, the behavior of system()-like functions (including _spawnvp) is different:  In the classic MSVCRT, they always use the system encoding (CP932 on Japanese Windows) to convert the multibyte argument, but in the Universal CRT, they depend on LC_CTYPE setting, which is "C" by default.

What happens when converting multibyte string into wide with LC_CTYPE=C?  The 8-bit value from the multibyte string simply gets widened into 16-bit; i.e. the multibyte string is treated as Latin-1, causing mojibake as observed!

# A Workaround

Setting LC_CTYPE equal to the system encoding (CP932 on Japanese Windows) fixes the problem.  This is done by `setlocale(LC_CTYPE, "")` in C and `os.setlocale("", "ctype")` in Lua.

In short: Put the line

> os.setlocale("", "ctype")

at the beginning of runscript.tlu.

ARATA Mizuki

More information about the tex-live mailing list