Compressing formats

Hironori KITAGAWA h_kitagawa2001 at yahoo.co.jp
Wed Dec 4 02:46:33 CET 2019


Hello all,

As latex-dev preloads expl3 (https://www.latex-project.org/news/2019/11/28/latex-dev-2020-2/),
size of format files are greatly increased:

# format engine (latex) -> (latex-dev)
latex    (pdftex) 4291213 ->  8067042
pdflatex (pdftex) 4291275 ->  8067104
platex   (eptex)  4557309 -> 10411263
uplatex	 (euptex) 4553312 -> 10407071
xelatex  (xetex)  3765560 ->  4507996 (compressed by zlib)

So I am doing an experiment for compressing formats 
(of pTeX and friends, to begin with) by lz4(hc):
https://github.com/h-kitagawa/texlive-source/tree/lz4hc-fmt

In this experiment, 
 * platex-dev.fmt becomes 2861929 bytes (about 6.5MB smaller).
 * processing "\documentclass{minimal}\begin{document}\end{document}" by platex
   has almost no overhead (128 ms vs 134.8 ms).

A test result is located at
https://github.com/h-kitagawa/texlive-source/tree/lz4hc-fmt/texk/web2c/eptexdir/tests/comp-tests

----
I know that XeTeX and LuaTeX compress formats by zlib, and
(e)(u)pTeX and pdfTeX are already linked with zlib (because SyncTeX).
However, I choose lz4(hc) for decompression speed.

* I also tested with zlib (-1), lzo (1x_1). See table at
  https://github.com/texjporg/tex-jp-build/issues/96 (Japanese) for detail.
* Linking lz4 library increases binary size by about 100 KB,
  but compression makes platex-dev.fmt smaller about 6.5MB (see above),
  so the total is better.
* We can use some time (but not much!) for compressing,
  because dumping formats is considered to be less often. 
  I choose default lz4(hc) compression level 5 (or 6?) is a good tradeoff between
  compression rate and compression time.

-- 
Hironori KITAGAWA <h_kitagawa2001 at yahoo.co.jp>


More information about the tex-live mailing list