[tex4ht] [bug #618] Incomplete XML Document, domfilter error, truncated build on large file.

Karl Berry karl at freefriends.org
Mon Dec 18 23:03:22 CET 2023


    How does dvitype know the actual number of pages?

dvitype reads the entire dvi file page by page, merely storing away the
total_pages value from the postamble so it can report any discrepancy,
not using it to actually parse the dvi file.

In theory it might be possible to change the tex4ht and t4ht programs to
do the same, but this is not something I'm going to spend time on. Aside
from the work of rearranging the basic logic of the programs, I think
it's likely that some other capacity problem will arise. Seems much more
reliable for Nasser to cut his documents down to something within range,
and use some other method to combine them, as desired.

I note that both dvips and dvipdfmx (didn't try others) behave like
tex4ht, i.e., they only process the "modulo" pages, not the whole
document.

    I've never thought much about TeX, the program.  
    I assume that some variant

It is Knuth who wrote this behavior in the first place. In original
tex.web, he wrote this comment:
  If |total_pages>=65536|, the \.{DVI} file will lie.

Original TeX effectively writes total_pages mod 65536. All other
TeX variants inherited that behavior.

    What would groff, which also can write DVI, have done with a
    document having more than 2^16 pages?

Since the postamble value in DVI format for total_pages is only two
bytes, groff cannot represent a value >=2^16 either.

Best,
Karl

P.S. Just for the archives, here is the tiny plain TeX file I wrote to
generate a document with 65600 DVI pages:

\count255=0
\loop\ifnum\count255 < 65600
  \advance\count255 by 1
    x\vfil\eject
\repeat
\end


More information about the tex4ht mailing list.