[tex-live] xindy vs utf8 latex

Joachim Schrod jschrod at acm.org
Thu May 8 02:19:02 CEST 2014

On 05/07/14 23:43, Zdenek Wagner wrote:
> 2014-05-07 23:23 GMT+02:00 Karl Berry <karl at freefriends.org
> <mailto:karl at freefriends.org>>:
>         Peter committed the change in TL SVN at April 28.
>         I assume the next rebuild will gather them.
>     No, everything has been up to date (in the pretest) since the
>     build of
>     Apr 28 (it's updated nightly, barring errors).  That is, in all
>     bindirs,
>     bin/texindy is a symlink to
>     ../texmf-dist/scripts/xindy/texindy.pl <http://texindy.pl>, and
>     that was one of the files Peter updated.  There is nothing to
>     wait for.
>         > The texindy in pretest does not support -L utf8
>     What should be run to determine this?  I looked briefly at the
>     source,
>     but it wasn't clear to me.
>     There are certainly a variety of utf8.xdy files in the runtime,
>     that's
>     all I can see at a glance.
> What I do not understand is "-L utf8".  As the help says, -L is
> used to set the language, -C is used to set the codepage, so I
> would expect "-C utf8".

Uumpfh. I'd like to hide in the cellar. If you see me now, I've got
colored with a very red face. My apologies.

Yes, addition of support of "-C utf8" for texindy (for Latin
scripts only) was meant, not "-L utf8". Sorry.

Ulrike Schäfer supplied the idea how I could supply basic UTF-8
support for Latin scripts raw indexes with LICR encoding. After
all, utf8  inputenc doesn't matter for xindy with these scripts,
that is irrelevant -- the raw indices that are in LICR encoding, we
don't care for the original source encoding. The only case where -C
option value matters is output of letter group headings. Even index
entries in final markup are not relevant, as they use LICR encoding

The current texindy solution is to provide acceptance of LICR
encodings of Latin scripts (and just them) and use -C option to
define the encoding of the letter group headings. Most people in
postings about xindy over-estimate what -C option does. For sorting
of raw indexes in LICR markup, *it's completely irrelevant*. In
fact, sorting works, but the output of letter group headings goes
awry -- and that is the sole flexibility needed for LICR-encoded
raw indexes.

Frankly, this is a design fault in xindy. We need to distinguish
input, sort, and output encoding, but we don't do so. That behavior
is utterly nonsense and should have never survived so long. We're
discussing how to resolve that, in a backward compatible manner.


Joachim Schrod, Roedermark, Germany
Email: jschrod at acm.org

More information about the tex-live mailing list