This manual is for TeX4ht.
Copyright 2009, 2010 TeX Users Group.
This work may be distributed and/or modified under the conditions of the LaTeX Project Public License, either version 1.3c of this license or (at your option) any later version. The latest version of this license is in http://www.latex-project.org/lppl.txt and version 1.3c or later is part of all distributions of LaTeX version 2005/12/01 or later.
This work has the LPPL maintenance status “maintained”.
The Current Maintainer of this work
is the TeX4ht Project
(http://tug.org/tex4ht).
TeX4ht is a TeX package created and developed by Eitan M. Gurari, who was Associate Professor of Computer Science at Ohio State University until his premature death on June 22, 2009. Our continuing work on his software is dedicated to his memory.
TeX4ht translates documents written in TeX or any of its common variants (LaTeX, ConTeXt, etc.) into other markup formats, such as HTML, XML, SGML, etc., optionally using MathML or other formats, with nearly endless possibilities for customization. The home page of the project is http://tug.org/tex4ht. The software is released under the LaTeX Project Public License, version 1.3 or later.
The present document is currently focused on maintenance of TeX4ht itself, which includes hundreds of TeX packages, hypertext fonts, C and Java programs, DTDs, usually all wrapped in a (homegrown) literate programming style. For user documentation, please see the resources on the home page. Perhaps this manual will be more extensive one day.
TeX4ht is currently maintained by CV Radhkrishnan and Karl Berry (the “TeX4ht Project”); we would be very grateful for additional volunteers. The development site, mailing lists, etc., are also linked from http://tug.org/tex4ht.
TeX4ht has a three-step approach to the translation process:
ht* to DVIfoo.tex is processed with the appropriate script (htex,
htlatex, htcontext, ...) which will load
tex4ht.sty and other relevant packages to create foo.dvi
by calling the tex compiler with appropriate format. TeX4ht
adopts a different pattern of package loading. It loads
tex4ht.sty at the beginning of the document, stops after a
while, then allows loading all the packages which the author wants
with \usepackage function. Once it reaches the
\begin{document} hook, which means that all extra package
loading has been completed, tex4ht loads itself for the second
time. This time, since it has the information about all additional
packages loaded, it will call the relevant .4ht macro packages
to assist the main tex4ht.sty.
For instance, if the author has used biblatex.sty,
tex4ht will call biblatex.4ht or if amsmath.sty
was used, amsmath.4ht will be input, and so on. Eitan wrote a
*.4ht for nearly all of the most often used LaTeX
packages.
Then the source foo.tex is processed in the usual manner to
create foo.dvi. With TeX4ht, we always need .dvi
output since .pdf output is not useful for conversion. This is
the first stage in the translation process.
tex4htThe second stage is to call the tex4ht binary to post-process
foo.dvi. This is the real meat of the process where ASCII
characters of element and attribute names, attribute values, etc.,
which are output in \specials in the .dvi, are
extracted. Also, it does the substitution of characters in textual
strings in the typeset version.
As you may be aware, the .dvi file has font and position
information of all characters of all strings in the document. Suppose
the .dvi has a character \gamma. When rendered to a
particular media, the character is taken from the 13th position of the
font by name, cmmi. When extracting text from the .dvi,
instead of taking the glyph from cmmi.pfb, tex4ht takes
the character from the 13th position in the corresponding hypertext
font, cmmi.htf (htf denoting hypertext font, multitudes
of which were again created by Eitan).
A hypertext font is an ASCII file, created by hand in a text
editor, with each line defining a character of the font. The first
line corresponds to character code 0, the second to character code 1,
etc. In cmmi.htf for example, the first 13 lines look
something like this:
cmmi 0 127 'Γ' '' Gamma 0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 'Δ' '' Delta 1 % cmmi.htf (unicode) 2003-03-27 % 'Θ' '' Theta 2 % Copyright (C) 2000--2003 Michel Goossens % 'Λ' '' Lambda 3 % Eitan M. Gurari % 'Ξ' '' Xi 4 % % 'Π' '' Pi 5 % This file can redistributed and/or % 'Σ' '' Sigma 6 % modified under the terms of the LaTeX % 'Υ' '' Upsilon 7 % Project Public License Distributed from % 'Φ' '' Phi 8 % CTAN archives in directory % 'Ψ' '' Psi 9 % macros/latex/base/lppl.txt; either % 'Ω' '' Omega 10 % version 1 of the License, or (at your % 'α' '' alpha 11 % option) any later version. % 'β' '' beta 12 % However, you are allowed to modify % 'γ' '' gamma 13 % this file without changing its name, if % ...
The character code given in the 13th position of cmmi.htf is
γ, which is the Unicode entity for lower case gamma
(\gamma). tex4ht will happily substitute this code in
place of the typeset gamma character in the dvi during
post-processing the .dvi. Hence, the converted document will
have appropriate entities (or whatever we want) in place of the
TeX-font-specific .dvi references. You can add prefixes or
suffixes to the entities or character codes. eg.,
<mi>γ</mi> (MathML code for \gamma).
The third and final stage is to post-process the translated document further which may involve:
.png or other images of math formulae
and equations, if requested (for the sake of browsers which do not
support MathML).
.css files for proper rendering in a
browser.
ftp to different destinations, etc.
Following are the literate source files which comprise TeX4ht. Some modifications to specific files are described below. We have globally updated the license information.
Specific processing instructions are provided as remarks at the top of
each source file. All packages, C and Java sources, fonts, DTD's,
etc., are generated from the literate sources by running TeX,
LaTeX or any of the many TeX4ht scripts such as ht,
htlatex, ...
tex4ht-4ht.tex
tex4ht-auto-script.tex
tex4ht-bibtex2.tex
tex4ht-c.tex
tex4ht-cond4ht.tex
tex4ht-cpright.tex
tex4ht-dir.tex
tex4ht-docbook-xtpipes.tex
tex4ht-docbook.tex
tex4ht-env.tex
tex4ht-fonts-4hf.tex
tex4ht-fonts-cjk-utf8.tex
tex4ht-fonts-cjk.tex
tex4ht-fonts-modern.tex
tex4ht-fonts-noncjk.tex
tex4ht-htcmd.tex
tex4ht-html-speech-xtpipes.tex
tex4ht-html-speech.tex
tex4ht-html0.tex
tex4ht-html32.tex
tex4ht-html4.tex
tex4ht-info-html4.tex
tex4ht-info-javahelp.tex
tex4ht-info-mml.tex
tex4ht-info-ooffice.tex
tex4ht-info-svg.tex
tex4ht-info.tex
tex4ht-javahelp-xtpipes.tex
tex4ht-javahelp.tex
tex4ht-jsmath.tex
tex4ht-jsml-xtpipes.tex
tex4ht-jsml.tex
tex4ht-mathltx.tex
tex4ht-mathml.tex
tex4ht-mathplayer.tex
tex4ht-mkht.tex
tex4ht-moz.tex
tex4ht-oo-xtpipes.tex
tex4ht-ooffice.tex
tex4ht-ooimpress.tex
tex4ht-options.tex
tex4ht-sty.tex
tex4ht-svg.tex
tex4ht-t4ht.tex
tex4ht-tei.tex
tex4ht-unicode.tex
tex4ht-word.tex
tex4ht-xhtml-xtpipes.tex
tex4ht-xhtmml-xtpipes.tex
xtpipes.tex
tex4ht-4ht.texThis is the (extremely large) literate source for all the .4ht
files in the TeX4ht bundle. Run the following command to generate
all .4ht files:
ht tex tex4ht-4ht
Nicholas Cole posted a bug report on the texhax mailing list
regarding an undefined control sequence error of
\blx@resetpuncthook and \blx@csq@ifkernmark. The
reason was that these macros were not initialized. So, we added the
following lines at the beginning of \<config biblatex\>:
\let\blx@resetpuncthook\@empty \let\blx@csq@ifkernmark\@empty
Christoph Haug reported that \bib@field@entrykey creates an
undefined control sequence error if \printbibliography is
invoked. Another of with uninitialized macros, solved by adding:
\let\bib@field@keyentry\@empty
Also, Christoph said that there were a few spurious spaces after the opening parenthesis of year in an author-year citation and few other places. All were fixed.
tex4ht-cpright.texThe standard copyright statement was changed to the following:
\<TeX4ht copyright\><<< % % This work may be distributed and/or modified under the % conditions of the LaTeX Project Public License, either % version 1.3c of this license or (at your option) any % later version. The latest version of this license is in % http://www.latex-project.org/lppl.txt % and version 1.3c or later is part of all distributions % of LaTeX version 2005/12/01 or later. % % This work has the LPPL maintenance status "maintained". % % The Current Maintainer of this work % is the TeX4ht Project <http://tug.org/tex4ht>. % % If you modify this program, changing the % version identification would be appreciated. >>>
Filename, author name and date are inserted at the top of this statement.
tex4ht-dir.texDefines the path of your tex4ht package files. The default
provided by Eitan was:
\def\HOME{/home/4/gurari/tex4ht.dir/}
\def\DTDS{/home/4/gurari/dtd.dir/}
We switched these to use . instead of his hardcoded path.
tex4ht-fonts-4ht.texThis file generates all the *.4hf—hypertext font files—of
the TeX4ht bundle. The file has 101806 lines! We had to increase
TeX's memory and make new format for \latex to run this file. Here
are the new values:
strings=494909pool_size=1180334 (string characters)main_memory=7999999 (words of memory)multiletter control sequences=15000+50000Also, these needed values are the default in TeX Live 2009:
font_mem_size=3000000 (words of font info)hyph_size=8191 (hyphenation exceptions)tex4ht-mkht.texCVR made significant changes on September 13, 2009:
\version has been redefined.
\ScriptFileName and \AddExtn have
been defined to add file names of the script at the top of each script
or batch file. These were not provided in the versions written by
Eitan, but now needed for best license practices.
\AddExtn will add .bat if and only if the script
is a batch file.
\<Mycopyrightnotice\> has been defined to
add the usual copyright information (see tex4ht-cpright.tex) to
each script when written out.
\Rem macro used in \<Mycopyrightnotice\>
expands to the # character in Unix scripts and Rem in
Windows batch files.