[tex-live] [tex-hyphen] german hyphenation patterns

Mojca Miklavec mojca.miklavec.lists at gmail.com
Tue Jun 10 13:45:07 CEST 2008

On Tue, Jun 10, 2008 at 12:03 PM, Stephan Hennig wrote:
> Mojca Miklavec schrieb:
>>> 2. Add package german-x containing experimental patterns to TeX Live
>>> 2008 and add them to language.dat.  (Roughly the following lines:
>>>  german-x-<date> dehypht-x-<date>.tex
>>>  =german-x-latest
>>>  ngerman-x-<date> dehyphn-x-<date>.tex
>>>  =ngerman-x-latest
>>> where the exact value of <date> will be announced later.)
>> That's all fine, with one big "but", that but being: the same
>> language.dat is being read by BOTH pdftex and xetex. So if you want to
>> put dehyphn-x-<date>.tex into language.dat, the patterns REALLY need
>> to be XeTeX-compatible, so you need to write a XeTeX wrapper
>> xu-dehyphn-x-<date>.tex first if you want to do that.
> I've already wondered what the xu- files are.  Actually, we're lacking
> some genius that knows the weaving of patterns into TeX with all its
> (new) variants inside-out.  To me, all this is quite new stuff.

Jonathan wrote those files. I have no idea how to create new files of
the same kind as he did, but it's rather straightforward to write them
if patterns are known to be in UTF-8 and if the target 8-bit encoding
is known (for German that's EC/T1).

So - we're ready to assist with that part. But we need clean patterns
in the first place.

>> It sounds fine to me to use your patterns by default when XeTeX loads
>> "normal" german patterns.
> Does that simplify things?

It neither complicates nor simplifes them. But one needs to take some
patterns for XeTeX. And those some could either be the old ones or
yours. I guess that it's "forbiden" to change the patterns for pdfTeX,
but if new patterns are better - why not using them in XeTeX where
"you-need-to-be-backward-compatible-with-20-years-old-hacks" is not
really an issue? (Fonts need to change from EC to OpenType, so one
cannot be backward compatible with the old TeX line breaks anyway.)

>> If you're providing additional packages for pdf(La)TeX, it's probably
>> helpful for the user, and that doesn't concern the process of pattern
>> conversion to utf-8. If someone writes a packages, go forward. XeTeX
>> may load additional language that it's not really going to use.
> The advantage of the hyphsubst package is that you can add updated
> patterns later and switch back and forth between those while babel only
> sees a request for language [n]german.

> With XeTeX loading (tomorrow's)
> experimental patterns by default, would this still be possible without
> going the xu- file route?

without cryptic xu- files: yes
without any wrapper: no (might be possible in future, but not right now)

Unless language.dat and its functionality is extended, you need to
load some wrapper which detects the engine and acts accordingly, but
the wrapper is really easy to create. If you have some TeX/babel guru
that knows how to do the following:

in luanguage.dat one would have:

german-x-<date> load-new-german-patterns.tex % would be some other name

and then load-new-german-patterns.tex would have access to the
language name, so - it would know that user supplied german-x-<date>
as the language name and could then load its own macros +

then you don't need to do anything else, but only change the pattern date.

Hmmm ... I hope Karl is not listening to this ...
(I wash my hands concerning support to the new german patterns in that
way - I'm not guilty for anything :)

If you really want to hear my opinion: I'm pretty sure that:
- you will rather soon change your mind about the best way of
implementation of these patterns; if nothing else, there's a need to
improve language.dat anyway - converting the patterns to unicode was
only the first step towards this
- if users start using dated patterns now, you will soon need to
provide dozens of megabytes to keep backward compatibility (and I bet
that the new dated version will only differ in maybe three or four
patterns), and if you change the way or strategy - will you keep those
dozens of megabytes for the sake of not even a dozen of users or are
you going to let them down by not providing that file in the
distribution any more, and thus breaking the documents?
- we're ready to put some hacks into new patten loading scheme; after
all - german patterns are already an exception, and if you can tell us
how to figure out if user requested new patterns, we can load new ones
instead of the old ones
- I would vote for putting only a single version of patterns on TeX
Live, but if users want to experiment with dated versions outside of
the release, they should be free to do so (if they had to fiddle
themselves, they will at least know what to do if that breaks in the
next release of TeX Live)
- TeX Live is going to provide packages with updates, so you can
implement upgrades and ship new patterns also after the official
- keep in mind that other languages are upgrading patterns as well; if
you're providing a way to change patterns based on version, it would
be nice if you would release the scheme once it would be clear how
other languages can benefit from it as well, or how other languages
can do the same - but it's the wrong timing to get that done before TL

I would vote for:
- some basic support for new patterns (with a big warning sign - use
at your own risk), to be able to get user feedback
- developing extensive support for any versioned-or-whatever-stuff
after TeX Live is out
But to be honest, I do not know the details of your work. I only
suspect that anything you will do now will very soon be a bit obsolete
(not the patterns themselves, but the way of loading them in some
hackish way).

> (BTW, does the same apply to LuaTeX?)

Just forget about LuaTeX for the moment.
- LuaLaTeX is still useless even if it's going to be shipped on TeX Live
- ConTeXt (the only one where LuaTeX is useful) has its own patterns
and pattern loading mechanisms, not even considering the fact that
LuaTeX support on TeX Live will probably be obsolete by the time when
DVDs are sent to members :P :)
- LuaTeX can load patterns at runtime, and it's quite possible that
the hyph-foo.tex files that you see in svn will be preprocessed to
generate even cleaner pattern files (one pattern per line, no
\patters{} macro, no comments, etc.)

>> A summary: please, please, please ... do take a look at
>> 1.) svn://tug.org/texhyphen/trunk/tex/loadhyph/loadhyph-xx.tex (german
>> is not the best example, take a look at "sl" instead; also the file
>> will probably change - it's autogenerated anyway)
>> and
>> 2.) svn://tug.org/texhyphen/trunk/tex/patterns/utf8/lang-de-1996.tex
>> for an example of what we would really like to see: plain patterns,
>> utf-8 encoded, no catcodes, no lccodes, no TeX macros
>> With two languages that goal was unavoidable, and with languages such
>> as german & french, I do not dare changing anything as it would break
>> compatibility with OT1-encoded fonts.
>> The overall scheme, locations, comments ... all that might change, but
>> the idea is to split content (patterns themselves) from intepretation
>> (adapting them for xetex or pdftex that uses T1 or T2A encoding,
>> depending on needs).
> I think, I got the message.  More off-list.


>> And for the sake of that, it would be really helpful to submit "clean"
>> files. Definitely I can write a script to convert your patterns into
>> the proper format, but I guess that you would want updates to happen
>> automatically whenever you update your patterns,
> The poor man that uses XeTeX with OpenType fonts for writing a thesis
> doesn't really want a change of ngerman's hyphenation patterns to happen
> automatically for it might change line breaking of the document.

But the poor man that wrote a thesis, wrote it a year ago at most (to
those who used it in 2004 the adjective does not apply). And those who
go for XeTeX, go for the better, and do not care too much if a line
will be broken slightly better. If they convert old documents, they
won't be broken in the same way as they were in the old document

I might be wrong, but I guess that most XeTeX users would vote for the
new patterns.

> Version control of evolving hyphenation patterns is what package
> hyphsubst was written for.  How does this relate to XeTeX?

Do hyphsubst any polyglossia know for each other? (Babel doesn't work
in XeTeX, polyglossia is babel replacement.) I have no idea what
hyphsubst does.


PS: You probably do not want to break backward compatibility (with a
not-yet-very-well-tested package hyphsubst, that might still change in
future) to the users to whom you have promised it (saying that they
may load a specific hyphenation file that will not change).

More information about the tex-live mailing list