[tex-live] language.def

Mojca Miklavec mojca.miklavec.lists at gmail.com
Mon Jun 16 14:22:49 CEST 2008

On Mon, Jun 16, 2008 at 12:51 PM, Norbert Preining wrote:
> HI all,
> On So, 15 Jun 2008, Karl Berry wrote:
>>     3) The supplied language.def loads only the US English hyphenation
>>     patterns. Would it be possible to include other hyphenation patterns
>>     by default (as is done with language.dat)?
>> Clearly this is the intent, but it takes a lot of effort to keep
>> language.dat, fmtutil.cnf, updmap.cfg in sync now.  I'm scared to try to
>> add another file of the same ilk.
> There is one question with this: Do we have *any* chance to
> auto-generate these lines from what is present in
>        Master/texmf/tex/generic/config/language.*.dat
> ??
> If yes, I could add some magic such that language.def is updated at the
> same time as language.dat (and in the same way!).


There have been some discussions on
http://www.tug.org/mailman/listinfo/tex-hyphen (maybe you should
subscire for passive reading?) and there is some effort going on on
http://www.tug.org/svn/texhyphen/trunk/. The files are nearly-ready to
be submitted to CTAN & included into TeX Live.

I don't know the details, but I have a feeling that language.def needs
lefthyphenmin & righthyphenmin data which is not available in

But: I do autogenerate language.foo.dat files for the languages in
repository (I need to fix some border cases like: Spanish dat file
also lists Catalan, Greek dat file lists all kinds of greek etc.) and
I could just as well auto-generate language.foo.def if needed. See
TL/texmf/tex/generic/config/; files are generated together with
loaders for languages.

There are some open questions concerning language.def vs. language.dat:
- germans want versioned patterns, so it would be nice to support some
- in language.dat there is no information about hyphenmin
- it would be nice if language.dat and language.def would be unified
(that's what I have heard, I don't know the details and I don't know
when each of them is used)

The new scheme for loading patterns has the following idea:

1.) language.dat contains proper language codes:

uppersorbian    loadhyph-hsb.tex
swedish         loadhyph-sv.tex
turkish         loadhyph-tr.tex
serbian         loadhyph-sr-latn.tex
serbianc        loadhyph-sr-cyrl.tex
greek           loadhyph-el-polyton.tex
monogreek       loadhyph-el-monoton.tex
ancientgreek    loadhyph-grc.tex
bulgarian       loadhyph-bg.tex
russian         loadhyph-ru.tex
ukrainian       loadhyph-uk.tex
norsk           loadhyph-nb.tex
nynorsk         loadhyph-nn.tex

2.) loadhyph-foo.tex takes care that:
- unicode patterns hyph-foo.tex are loaded for the particular language
- for 8-bit engines either the proper UTF-8 to ENCODING is done first
and then patterns are loaded
 (last year the same ugly job in the other direction has been done by
xu-hyphfoo.tex wrappers, except that they were full of hacks)
- sometimes the conversion cannot be done 1:1, an example for that is
Greek with combining accents or German where I do not dare to afford
not supporting OT1 encoding; in such cases, the old file is loaded the
usual way
- sometimes it sets some additional lccodes (apostrophe, dash etc.)

3.) Now the patterns are stored in one place and the knowledge about
patterns (such as which encoding they are written in, what catcodes
they need) is stored in some other file, so that TeX macros that are
needed to handle the patterns are engine-specific

I have also written a generator of tlpsrc files for languages, but I
need some instructions (from you and Karl) about what should go into
those files.

I have absolutely no insight into TeX Live tools, but we can
coordinate to simplify the things as much as possible. We have taken
the effort to "purify" the patterns, what's left to be done is
packaging them properly (and add proper copyright notes on top of


More information about the tex-live mailing list