[tex4ht] problem with slow compilation of large latex file with large math content

Nasser M. Abbasi nma at 12000.org
Fri Mar 25 22:53:40 CET 2016


I have lots of large latex files, with lots of pages in
them with large number of equations, generated by
computer algebra systems. I also have lots of includegraphics
in these files for svg images.

I noticed that tex4ht becomes very slow as number of pages
increase. This is becoming so bad, that I ended up
buying new PC and installing Linux on it just in the hope
it will speed things up (I was using Vbox on windows,
then I tried cygwin on windows).

For example, for one file, using Vbox, it took 14 hrs
for make4ht to compile the file to html. On cygwin, it took
little less than than. About 10 hrs. This is on windows 7, 64 bit
16 GB ram, fast intel i7-3930k CPU.

On new PC (24 GB RAM), intel i7-6700k, 64 bit, it took 5 hrs.  Ok,
much better. so TL is more optimized for native Linux vs. cygwin.
VBox is expected to be slower since it is software emulation of PC.

Note also, the disk is solid state in all cases. So fast disk.

This is all using TL 2015. This on a PC with nothing else running
on it.

But the issue is, pdflatex and lualatex take about 5 minutes
on the same file to compile it to pdf !

I can understand converting to HTML will take more time,
since each equation is converted to svg image, etc... but
why is the timing so much more? Is this really to be expected?

What happens in this: tex4ht starts fast initially, I see

(./report.4ct) [3] [4] [5] [6] [7] [8] [9]...

printed on the terminal very fast, then it starts to slow
down, the higher the number becomes (I assume these correspond
to page numbers that tex4ht is processing). When it gets
to [3596] [3597] [3598] [3599] [3600] [3601].... it starts
to take few second to update. The larger the numbers, the
slower it gets.

It also seems tex4ht has more than one pass. As I see it
generating these sequence of numbers  more than one time.

I can make a zip file with typical large latex file
with all the images it uses and my .cfg and main.mk4
and the command I used to compile the latex file if
any one wants to confirm this problem. Would this be ok?

Or should I file a "performance bug" first on this at
tex4ht and put a link to the zip file?

Or is it better discuses this first? I think the slow
down is in the IO to the .dvi or dvi file, but this is
just a guess. I chatted with Michal about this in tex
stackexchange chat room also.

I can provide more information, etc...  I have many many
latex files this large, and now it takes 20 days at this slow
level to compile one set of them to HTML. This is way
too long, given that lualatex takes one hr or so.

Finally, is there a document that describes the passes/process
that tex4ht uses to compile to HTML at some high level? Like block
diagram, or such. I am not able to find such design document.

--Nasser


More information about the tex4ht mailing list