[tex4ht] what is the fastest way to convert large document to HTML?

Nasser M. Abbasi nma at 12000.org
Thu Aug 23 03:01:06 CEST 2018


On 8/22/2018 7:14 PM, Reinhard Kotucha wrote:
> On 2018-08-22 at 01:39:45 -0500, Nasser M. Abbasi wrote:
> 
>   > I think all the speed is comming from the make -jN method.
> 

> If you are talking about speed, yes.  But in practice you save even
> much more time if you follow Karl Berry's advice.
> 

> | Another thought: can you break the document into parts? Such that
> | you only need to compile something smaller while you're working on
> | it, and only compile the whole thing together at the end.
> 
> This is what "make" is written for in the first place.  The rules and
> dependencies have to be set up properly, of course.
> 

Sorry that I did not better explain how this specific document is
generated. This document was generated by a program I am writing
which generates the latex file.

This program reads thousands of problems from a text file, solves
them inside computer algebra system, and generate the solution in
latex (it uses CAS to translate the math to latex as it runs) and
sends the output to one latex file

         data ---> program --> latex file

Each time I fix a bug or make change to the program or add new
problem to the input, I run it and it regenerates new latex file,
each time from scratch.

I could do this few times a day. It now takes about 30 minutes
to run the program and now thanks to the speed improvement made by
Michal's new setup, it takes about 1 hr to compile to HTML. Before
this would have taken good part of a day to compile to HTML.

If I was writing latex docuemnt by hand, ofcourse I will break
the document into parts and then makefile will compile only
the latex files that changed. But this is not the case here.

Here is the latest output of the program, fyi, just build new
one

https://www.12000.org/my_notes/solving_ODE/current_version/index.htm

The PDF is now 4,200 pages.

For another similar thing I did this summer, it took about one
month to compile all the latex to HTML. Yes, one whole month, running
24 hrs a day.

That one had 200 such PDF files and many were much larger than 4,000 pages.
And all were generated by program which generates the latex each time
it runs.

>   > I do not think RAM is that important here. I see little RAM used
>   > when make4ht is running relatively speaking. make4ht is now running
>   > and using only 150 MB memory, and I have 64 GB ram to use.
> 
> Memory not currently used by processes is used by the filesystem in
> order to cache files.
> 
> Transfer rates on my machine:
> 
>    Hard Drive:  150 MB/s
>    SSD:         500 MB/s
>    FS cache:    8.5 GB/s
> 
> Example:
> 
>    $ sudo echo 3 > /proc/sys/vm/drop_caches
> 
>    $ time for f in tlnet/archive/*; do md5sum $f; done >/dev/null
>    real  1m22.653s
>    user  0m28.341s
>    sys   0m36.449s
> 
>    $ time for f in tlnet/archive/*; do md5sum $f; done >/dev/null
>    real  0m14.793s
>    user  0m9.173s
>    sys   0m6.209s
> 
> If the cache isn't large enough it can happen that if you start a
> clumsy web browser, files of your project are removed from the cache.
> Of course, 64 GB is much more than needed.  But you should check the
> VBox configuration.  Maybe VBox uses much less RAM by default.
> 
>   > So now I am shopping for a new CPU with most cores I can find
>   > as it turns out this is the magic behind speeding up tex4ht
>   > more than anything else.
>   >
>   > There are CPU's now with 18 cores and 32 cores.  So I need to
>   > start saving money to buy one of these as they get expensive,
>   > but considering the amount of time saved, they are well worth it,
>   > after all, time is money also.
> 
> I'm not convinced.  Sure, time is money.  When I bought a new computer
> last year I considered power consumption as well.  It matters because
> my machine is running 24/7 and because nothing has such a big impact
> on lifetime of electronic components as heat.
> 
> If you have a system monitor like xosview installed, you'll see that
> only very few programs can utilize more than one CPU.
> 
> IMO it's much better to improve coditional compilation as suggested by
> Karl.
> 
> Suppose that compiling all the TeX Live binaries takes half an hour.
> If you change a single source file, only this file is compiled again
> and the whole build process takes only a few seconds.  I'm convinced
> that improving the tex4ht Makefile accordingly is the best thing one
> can do.
> 
> If you put your whole project, including all scripts which are not
> part of TeX Live 2018, on your server, I can play with it next
> weekend.  Don't worry about the size of the ZIP file.
> 
> Regards,
>    Reinhard
> 

Thanks for the info. Very useful to know.

--Nasser


More information about the tex4ht mailing list