[tex4ht] what is the fastest way to convert large document to HTML?

Reinhard Kotucha reinhard.kotucha at web.de
Thu Aug 23 02:14:00 CEST 2018


On 2018-08-22 at 01:39:45 -0500, Nasser M. Abbasi wrote:

 > I think all the speed is comming from the make -jN method.

If you are talking about speed, yes.  But in practice you save even
much more time if you follow Karl Berry's advice.

| Another thought: can you break the document into parts? Such that
| you only need to compile something smaller while you're working on
| it, and only compile the whole thing together at the end.

This is what "make" is written for in the first place.  The rules and
dependencies have to be set up properly, of course.

 > I do not think RAM is that important here. I see little RAM used
 > when make4ht is running relatively speaking. make4ht is now running
 > and using only 150 MB memory, and I have 64 GB ram to use.

Memory not currently used by processes is used by the filesystem in
order to cache files.

Transfer rates on my machine:

  Hard Drive:  150 MB/s
  SSD:         500 MB/s
  FS cache:    8.5 GB/s

Example:

  $ sudo echo 3 > /proc/sys/vm/drop_caches

  $ time for f in tlnet/archive/*; do md5sum $f; done >/dev/null 
  real  1m22.653s
  user  0m28.341s
  sys   0m36.449s

  $ time for f in tlnet/archive/*; do md5sum $f; done >/dev/null 
  real  0m14.793s
  user  0m9.173s
  sys   0m6.209s

If the cache isn't large enough it can happen that if you start a
clumsy web browser, files of your project are removed from the cache.
Of course, 64 GB is much more than needed.  But you should check the
VBox configuration.  Maybe VBox uses much less RAM by default.

 > So now I am shopping for a new CPU with most cores I can find
 > as it turns out this is the magic behind speeding up tex4ht
 > more than anything else.
 > 
 > There are CPU's now with 18 cores and 32 cores.  So I need to
 > start saving money to buy one of these as they get expensive,
 > but considering the amount of time saved, they are well worth it,
 > after all, time is money also.

I'm not convinced.  Sure, time is money.  When I bought a new computer
last year I considered power consumption as well.  It matters because
my machine is running 24/7 and because nothing has such a big impact
on lifetime of electronic components as heat.

If you have a system monitor like xosview installed, you'll see that
only very few programs can utilize more than one CPU.  

IMO it's much better to improve coditional compilation as suggested by
Karl.

Suppose that compiling all the TeX Live binaries takes half an hour.
If you change a single source file, only this file is compiled again
and the whole build process takes only a few seconds.  I'm convinced
that improving the tex4ht Makefile accordingly is the best thing one
can do.

If you put your whole project, including all scripts which are not
part of TeX Live 2018, on your server, I can play with it next
weekend.  Don't worry about the size of the ZIP file.

Regards,
  Reinhard

-- 
------------------------------------------------------------------
Reinhard Kotucha                            Phone: +49-511-3373112
Marschnerstr. 25
D-30167 Hannover                    mailto:reinhard.kotucha at web.de
------------------------------------------------------------------


More information about the tex4ht mailing list