Patgen

Arthur Reutenauer arthur.reutenauer at normalesup.org
Wed May 15 19:53:56 CEST 2019


	Hi Keno,

On Tue, May 14, 2019 at 10:55:32PM +0200, Keno Wehr wrote:
> Is it possible to adapt patgen for such huge lists?

  If you’re able to compile patgen yourself, it should be enough to
change trie_size and triec_size in patgen.ch, currently set to
10,000,000 and 5,000,000 respectively.  It is possible that the
percentages still will look silly because they’re computed as

	100 * good_count / ((double) good_count + miss_count)

so that the numerator could result in an integer overflow considering
the orders of magnitude we’re talking about: with 11 million entries,
good_count could easily be over 22 million, which multiplied by a
hundred will be more than can fit in a signed 32-bit integer.  I am
however not able to test it myself because the public repository for
Classical Latin hyphenation currently only produce a list of a little
over 2 million entries (I suppose you’re running patgen from the script
in https://github.com/wehro/hyphen-la/tree/master/patterns/generation).

	Best,

		Arthur


More information about the tex-live mailing list