Generating small documents quickly. Was: [pdftex] redirecting tex dvi output

Wed Jan 30 20:26:06 CET 2002

Posted to both pdftex at tug.org and texd at tug.org

Summary:

We discuss use of pipes with TeX and pdfTeX.  An example
shows that avoiding startup costs can give a 20-fold
reduction in time (if the job is small).  We describe 
the program dvichop, which allows TeX users to realise
this benefit.

=====

Lloyd Dalton (in a message to the pdftex list) wrote:

> Hello,
> 
>   I'm working on generating dynamic pdf output via
> tex/latex and dvipdfm.  This process works fine when
> the tex source is written to a file (foo.tex) and the
> latex output (foo.dvi) is run through dvipdfm.

Lots of people are using TeX or pdfTeX to generate
typeset pages on the fly, either for serving as a 
web page (in your case) or for instant preview (in
my case).

>   But my goal is to eliminate the file I/O entirely. 

Why?  You seem to think that file I/O is a bottleneck,
which restricts performance.  But is it?  The following
shows that (on my antiquated machine) starting up TeX
takes a .25 second, while typesetting a small page takes 
an additional .013 second.  Twenty times quicker!

This example is the single most important part of this
posting.  Understand this, and everything will be clear.
(Well, I exaggerate a little.)

=====
jfine at active-tex:~ > time tex \\end
This is TeX, Version 3.14159 (Web2C 7.3.1)
No pages of output.
Transcript written on texput.log.

real    0m0.259s
user    0m0.200s
sys     0m0.050s
jfine at active-tex:~ > time tex story \\end
This is TeX, Version 3.14159 (Web2C 7.3.1)
(/usr/share/texmf/tex/plain/base/story.tex [1])
Output written on story.dvi (1 page, 668 bytes).
Transcript written on story.log.

real    0m0.272s
user    0m0.220s
sys     0m0.030s
jfine at active-tex:~ > 
=====

> dvipdfm's file output can be redirected to
> /dev/stdout, but I've been unable to make tex store
> its output anywhere other than foo.dvi or texput.dvi.

TeX was written (to very high standards) about 20 years ago,
to run on pretty well all computers.  I doubt that file
redirection was universally available then.  So that might
explain the failure.

>   It's possible that the Texd project 
> (http://tug.org/mailman/listinfo/texd) has the answer,

I hope so.  I've done some work in the area.

> but the list seems pretty inactive and the reference
> website is down (http://www.activetex.org).  

Well, the list could use some more traffic.  I don't know
why the reference website is down.  It was up today.

> I have a
> hard time believing that standard TeX offers no way to
> redirect dvi output.  Can it be done?

Yes, under UNIX, and maybe WinNT.  You need named pipes.
Create a named pipe foo.dvi, and start a separate process
that copies foo.dvi to standard out (or wherever you want).
Run this process in the background.  Now start up
  tex foo
and the pages will go to foo.dvi, which is copied to stdout.

Mission accomplished.

>   Apologies for asking this basic TeX question on a
> list that's for pdf-specific things (although tex->pdf
> is my goal).  Thanks in advance for any help.

Well, pdftex is one route from TeX to pdf.  And dvipdfm is
another.

But reducing the time seems to be your unstated goal.

For this, as the example above shows, removing the 
initialisation of TeX from the loop can bring great
benefits.

However, almost all dviware assumes that the input stream
is an ordinary file.  So it seeks to the end of the dvi file,
where TeX helpfully writes a list of all fonts used in the
job.  The fonts are also listed in the pages, just before
their first use in the whole of the file.  The final list of
fonts (and some other dvi features) allow quick random access
to the pages of a dvi file.  (It's worth understanding that
almost all dviware assumes that the input is an already
existing dvi file, rather than a dynamically created stream.
The dvi previewer on the Next doesn't.)

But we want all pages of a short document, which is part
of the TeX output stream of dvi pages.  Doing this will
give us the performance benefit.

One way to get this benefit is to rewrite dvipdfm, so that
assumes only that the input is a stream, and give it the
facility to generate multiple output PDF files.  (BTW,
many XSLT processors have an extension that allows multiple
XML files to be created in one run.  For exactly the same
reason.)

But there is another way.  Enter dvichop.  This is an output
filter, that can be attached to a dvi stream, which chops
the input dvi into smaller pieces.

Now use a three step process

tex foo  ==> reading foo.tex, writing to foo.dvi

dvichop  ==> reading foo.dvi, writing 001.dvi, 002.dvi, ...

dvipdfm  ==> reading nnn.dvi, writing nnn.pdf

By the way, you might want to make foo.tex a named pipe, and
then your server can write to that pipe as and when it needs
a dvi page.

Once the process is going, you can generate dvi files at a
rate of 50 per second, say, instead of 4 per second (allowing
for overheads.)

This, basically, is how Instant Preview works.  Except of
course, we use xdvi for process the output of dvichop.

My focus is on the interactive use of TeX, rather than on
make a web server run quicker.  But the same technology
applies in both cases.  And brings similar benefits.

PdfdTeX, like TeX, takes a while to start up.  The same time
benefit can presumably be obtained, this time by writing a
program pdfchop.

Well, I hope this helps.  I'm excited by the possibilities.

best regards

Jonathan