[tex-live] xindy and folders with non ascii chars

Bruno Haible bruno at clisp.org
Sat Sep 22 03:46:25 CEST 2018


Ulrike Fischer writes in
<https://www.tug.org/pipermail/tex-live/2018-September/042369.html>:

> G:\Z-Test\jürgen>texindy test.idx
> *** - PARSE-NAMESTRING: syntax error in filename
>       "G:\\\\Z-Test\\\\j³rgen\\\\S1Jt2cb7Dx" at position 13
> 
> S1Jt2cb7Dx is the temporary file which (on windows) is created in
> the current directory and PARSE-NAMESTRING is imho a clisp function. 

Yes, PARSE-NAMESTRING is a clisp function.

I don't understand why the backslashes are doubled. I would have
expected a namestring
  "G:\\Z-Test\\jürgen\\\\S1Jt2cb7Dx"
(cf. https://clisp.sourceforge.io/impnotes/path-external-notation.html)

The conversion from 'ü' to '³' can be explained as follows: clisp
uses the so-called "ANSI code page" (= windows-1252) for output,
but the console you are using is set to interpret arriving bytes
in the so-called "OEM code page" (= CP850). In fact 'ü' = 0xFC in
windows-1252, and 0xFC in CP850 is '³'.

As documented in https://clisp.sourceforge.io/impnotes/encoding.html#enc-dflt
filenames are supposed to be encoded in *PATHNAME-ENCODING*.

When you start clisp (in the build used by texindy), what is the
value of CUSTOM::*PATHNAME-ENCODING* ?


Richard M Kreuter writes in
<https://www.tug.org/pipermail/tex-live/2018-September/042397.html>

> on one Ubuntu host I've got access to,
> /usr/lib/clisp-2.49/linkkit/clisp.h contains this line:
> 
> #define VALID_FILENAME_CHAR ((ch >= 1) && (ch != 47))
> 
> On this Ubuntu host, the Lisp expression
> 
>   (parse-namestring "G:\\Z-Test\\jürgen\\S1Jt2cb7Dx")
> 
> returns, i.e., it does not error.
> 
> But on one OSX host where I've built Clisp from source, src/config.h
> contains
> 
> #define VALID_FILENAME_CHAR ((ch >= 1) && (ch <= 127) && (ch != 47))
> 
> and, indeed, on this OSX host, the earlier Lisp expression errors.

The handling of filenames in Windows, Linux, and macOS is quite different.
The only common things are that the variable *PATHNAME-ENCODING* exists
and this macro VALID_FILENAME_CHAR is defined in some way.


Akira Kakuto writes in
<https://www.tug.org/pipermail/tex-live/2018-September/042404.html>:

> I find the following in the present
> version in TeX Live 2018 (w32):
> 
> CLISP version 2.49.92 (2018-02-18)
> 
> #define VALID_FILENAME_CHAR ((ch >= 32) && (ch <= 61) && \
> (ch != 34) && (ch != 42) && (ch != 47) && (ch != 58) && \
> (ch != 60)) || ((ch >= 64) && (ch <= 132) && (ch != 92) && \
> (ch != 124) && (ch != 130)) || ((ch >= 137) && (ch <= 234) && \
> (ch != 152)) || ((ch >= 240) && (ch != 252))

It is normal that the VALID_FILENAME_CHAR expression is complicated
like this on Windows. Ulrike Fischer's problem is that the character
0xFC = 252 is considered invalid by this expression.

In a Western (English US) Windows I get this expression:
#define VALID_FILENAME_CHAR ((ch >= 32) && (ch <= 61) && \
(ch != 34) && (ch != 42) && (ch != 47) && (ch != 58) && \
(ch != 60)) || ((ch >= 64) && (ch != 92) && (ch != 124))

Apparently this expression depends on the system encoding of Windows.
This creates a problem when compiling clisp on, say, a Chinese Windows
and then running it in a Western Windows or vice versa.

Registered as bug https://gitlab.com/gnu-clisp/clisp/issues/10 .


Karl Berry writes in
<https://www.tug.org/pipermail/tex-live/2018-September/042413.html>:
> Why not
> #define VALID_FILENAME_CHAR (1)
> ? What is gained by all these conditions?

When the user enters an invalid file name,
1. clisp signals an error before the file name hits the file system,
   namely already when the Lisp pathname gets constructed,
2. the error message indicates the cause (remember that errors on
   a file system can be caused by invalid file names, permission
   problems, or even temporary issues like disk-full problems).


Bruno




More information about the tex-live mailing list