Bug in texlive.tlpdb

Michal Vlasák lahcim8 at gmail.com
Tue May 18 22:56:41 CEST 2021


On Tue May 18, 2021 at 6:24 PM CEST, Naveen M K wrote:
> Hi,
>
>  I noticed a bug in `texlive.tlpdb`, in the section of
> `execute AddHyphen` in `hyphen-arabic` and `hyphen-farsi` package info.
>
> execute AddHyphen name=arabic lefthyphenmin= righthyphenmin=
> file=zerohyph.tex file_patterns=

> where, the `lefthyphenmin` and `righthyphenmin` parameter in it
> shouldn't be empty. Instead, it should be 2 and 3. I found the values
> from the per-existed `language.lua.dat`.

Hello,

I too have dealt with this oddity before. As already noted by Karl the
values for these particular languages are irrelevant because hyphenation
is suppressed anyways. But some values have to be set for
"language.lua.dat" where empty value is not an option.

I am not sure about how you encountered the problem in your
redistribution of TeX Live, but I seem to remember that you roll your
own scripts for some things. I am far from expert on TeX Live /
languages, but here is what I know:

 - TeX Live uses subroutine "parse_AddHyphen_line" (TLUtils.pm) to parse
   these lines. This is where the default values of "2" and "3" come
   from:

      my $default_lefthyphenmin = 2;
      my $default_righthyphenmin = 3;

 - The input lines ("execute AddHyphen") originally come from tex-hyphen
   project ( e.g.
   https://github.com/hyphenation/tex-hyphen/blob/master/TL/tlpkg/tlpsrc/hyphen-arabic.tlpsrc)
   and they rely on these default values.

 - Subroutines "language_dat_lines", "language_def_lines" and
   "language_lua_lines" called from "_parse_hyphen_execute" (all in
   TLPOBJ.pm) are used to generate the files "language.dat",
   "language.def" and "language.dat.lua" respectively. The handling of
   "synonyms" is probably the trickiest part and different in all three.

I have previously needed to parse these "execute AddHyphen" lines to
construct "language.dat.lua" and "language.def" myself (I didn't need
language.dat). I don't know if it helps your case at all, but here is
the script I used:

https://github.com/vlasakm/mmtex/blob/876fe6c7fd09a833399a31dbe9be06a9ba978a1b/mmtex/files/extract-language-data.awk

It was enough for my purposes (LuaTeX only), and may not be completely
compatible with the way TeX Live does this.


For general TeX Live / TeX hyphen discussion:

While at the topic. I seem to have noted in the previous version of the
script, that the handling of synonyms was weird at the very least (the
comment "synonyms in language.def ???" in TLPOBJ.pm seems to agree). As
far as I remember the current handling of synonyms in "language.def"
means, that for all eTeX engines (except LuaTeX) the same set of
hyphenation patterns is preloaded multiple times.

Another weird thing about synonyms is that while they are handled
correctly when coming from packages (generated by the mentioned
procedures), the US English synonyms (e.g. "american") are defined only
in "language.us.lua" and not in "language.us" nor "language.us.def".
This means that "\uselanguage{american}" can't currently work in pdfTeX
and XeTeX.

In LuaTeX, these US English synonyms also don't work. While
"luatex-hyphen.lua" (loaded by "etex.src") supports synonyms, it doesn't
even have the chance to do so, because "etex.src" (special LuaTeX
version) requires the synonym to be defined in "language.def".

The situation with "language.dat.lua" is also a bit weird --
"luatex-hyphen.lua" (from hyph-utf8) was designed to parse it, but IIRC
polyglossia parses it by itself, while babel and OpTeX don't use the
file at all. Even though the file and format is nice, it is still
sadly limited by the basic eTeX interface.

(Not that any of this really matters, because nobody seemed to have
noticed.)

Regard,
Michal Vlasák



More information about the tex-live mailing list.