Table of Contents ***************** Extrapolating TeX4ht 1 Introduction 2 Implementation: How TeX4ht works 2.1 Preprocessing with `ht*' to DVI 2.2 Processing with `tex4ht' 2.3 Post-processing 3 Literate sources 3.1 `tex4ht-4ht.tex' 3.2 `tex4ht-cpright.tex' 3.3 `tex4ht-dir.tex' 3.4 `tex4ht-fonts-4ht.tex' 3.5 `tex4ht-mkht.tex' Extrapolating TeX4ht ******************** This manual is for TeX4ht. Copyright 2009, 2010 TeX Users Group. This work may be distributed and/or modified under the conditions of the LaTeX Project Public License, either version 1.3c of this license or (at your option) any later version. The latest version of this license is in `http://www.latex-project.org/lppl.txt' and version 1.3c or later is part of all distributions of LaTeX version 2005/12/01 or later. This work has the LPPL maintenance status "maintained". The Current Maintainer of this work is the TeX4ht Project (`http://tug.org/tex4ht'). 1 Introduction ************** TeX4ht is a TeX package created and developed by Eitan M. Gurari, who was Associate Professor of Computer Science at Ohio State University until his premature death on June 22, 2009. Our continuing work on his software is dedicated to his memory. TeX4ht translates documents written in TeX or any of its common variants (LaTeX, ConTeXt, etc.) into other markup formats, such as HTML, XML, SGML, etc., optionally using MathML or other formats, with nearly endless possibilities for customization. The home page of the project is `http://tug.org/tex4ht'. The software is released under the LaTeX Project Public License, version 1.3 or later. The present document is currently focused on maintenance of TeX4ht itself, which includes hundreds of TeX packages, hypertext fonts, C and Java programs, DTDs, usually all wrapped in a (homegrown) literate programming style. For user documentation, please see the resources on the home page. Perhaps this manual will be more extensive one day. TeX4ht is currently maintained by CV Radhkrishnan and Karl Berry (the "TeX4ht Project"); we would be very grateful for additional volunteers. The development site, mailing lists, etc., are also linked from `http://tug.org/tex4ht'. 2 Implementation: How TeX4ht works ********************************** TeX4ht has a three-step approach to the translation process: 2.1 Preprocessing with `ht*' to DVI =================================== `foo.tex' is processed with the appropriate script (`htex', `htlatex', `htcontext', `...') which will load `tex4ht.sty' and other relevant packages to create `foo.dvi' by calling the `tex' compiler with appropriate format. TeX4ht adopts a different pattern of package loading. It loads `tex4ht.sty' at the beginning of the document, stops after a while, then allows loading all the packages which the author wants with `\usepackage' function. Once it reaches the `\begin{document}' hook, which means that all extra package loading has been completed, `tex4ht' loads itself for the second time. This time, since it has the information about all additional packages loaded, it will call the relevant `.4ht' macro packages to assist the main `tex4ht.sty'. For instance, if the author has used `biblatex.sty', `tex4ht' will call `biblatex.4ht' or if `amsmath.sty' was used, `amsmath.4ht' will be input, and so on. Eitan wrote a `*.4ht' for nearly all of the most often used LaTeX packages. Then the source `foo.tex' is processed in the usual manner to create `foo.dvi'. With TeX4ht, we always need `.dvi' output since `.pdf' output is not useful for conversion. This is the first stage in the translation process. 2.2 Processing with `tex4ht' ============================ The second stage is to call the `tex4ht' binary to post-process `foo.dvi'. This is the real meat of the process where ASCII characters of element and attribute names, attribute values, etc., which are output in `\special's in the `.dvi', are extracted. Also, it does the substitution of characters in textual strings in the typeset version. As you may be aware, the `.dvi' file has font and position information of all characters of all strings in the document. Suppose the `.dvi' has a character \gamma. When rendered to a particular media, the character is taken from the 13th position of the font by name, `cmmi'. When extracting text from the `.dvi', instead of taking the glyph from `cmmi.pfb', `tex4ht' takes the character from the 13th position in the corresponding hypertext font, `cmmi.htf' (`htf' denoting hypertext font, multitudes of which were again created by Eitan). A "hypertext font" is an ASCII file, created by hand in a text editor, with each line defining a character of the font. The first line corresponds to character code 0, the second to character code 1, etc. In `cmmi.htf' for example, the first 13 lines look something like this: cmmi 0 127 'Γ' '' Gamma 0 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 'Δ' '' Delta 1 % cmmi.htf (unicode) 2003-03-27 % 'Θ' '' Theta 2 % Copyright (C) 2000--2003 Michel Goossens % 'Λ' '' Lambda 3 % Eitan M. Gurari % 'Ξ' '' Xi 4 % % 'Π' '' Pi 5 % This file can redistributed and/or % 'Σ' '' Sigma 6 % modified under the terms of the LaTeX % 'Υ' '' Upsilon 7 % Project Public License Distributed from % 'Φ' '' Phi 8 % CTAN archives in directory % 'Ψ' '' Psi 9 % macros/latex/base/lppl.txt; either % 'Ω' '' Omega 10 % version 1 of the License, or (at your % 'α' '' alpha 11 % option) any later version. % 'β' '' beta 12 % However, you are allowed to modify % 'γ' '' gamma 13 % this file without changing its name, if % ... The character code given in the 13th position of `cmmi.htf' is `γ', which is the Unicode entity for lower case gamma (\gamma). `tex4ht' will happily substitute this code in place of the typeset gamma character in the `dvi' during post-processing the `.dvi'. Hence, the converted document will have appropriate entities (or whatever we want) in place of the TeX-font-specific `.dvi' references. You can add prefixes or suffixes to the entities or character codes. eg., `γ' (MathML code for \gamma). 2.3 Post-processing =================== The third and final stage is to post-process the translated document further which may involve: * parse the document with appropriate parser. * create `.png' or other images of math formulae and equations, if requested (for the sake of browsers which do not support MathML). * write out `.css' files for proper rendering in a browser. * perform system dependent tasks like copying to target directories or `ftp' to different destinations, etc. * During post-processing, one can output the translated document as several chunks, such as one file for each section, instead of having a single long document. We use this feature to write out many files to overcome various I/O limitations of TeX. 3 Literate sources ****************** Following are the literate source files which comprise TeX4ht. Some modifications to specific files are described below. We have globally updated the license information. Specific processing instructions are provided as remarks at the top of each source file. All packages, C and Java sources, fonts, DTD's, etc., are generated from the literate sources by running TeX, LaTeX or any of the many TeX4ht scripts such as `ht', `htlatex', ... 1. `tex4ht-4ht.tex' 2. `tex4ht-auto-script.tex' 3. `tex4ht-bibtex2.tex' 4. `tex4ht-c.tex' 5. `tex4ht-cond4ht.tex' 6. `tex4ht-cpright.tex' 7. `tex4ht-dir.tex' 8. `tex4ht-docbook-xtpipes.tex' 9. `tex4ht-docbook.tex' 10. `tex4ht-env.tex' 11. `tex4ht-fonts-4hf.tex' 12. `tex4ht-fonts-cjk-utf8.tex' 13. `tex4ht-fonts-cjk.tex' 14. `tex4ht-fonts-modern.tex' 15. `tex4ht-fonts-noncjk.tex' 16. `tex4ht-htcmd.tex' 17. `tex4ht-html-speech-xtpipes.tex' 18. `tex4ht-html-speech.tex' 19. `tex4ht-html0.tex' 20. `tex4ht-html32.tex' 21. `tex4ht-html4.tex' 22. `tex4ht-info-html4.tex' 23. `tex4ht-info-javahelp.tex' 24. `tex4ht-info-mml.tex' 25. `tex4ht-info-ooffice.tex' 26. `tex4ht-info-svg.tex' 27. `tex4ht-info.tex' 28. `tex4ht-javahelp-xtpipes.tex' 29. `tex4ht-javahelp.tex' 30. `tex4ht-jsmath.tex' 31. `tex4ht-jsml-xtpipes.tex' 32. `tex4ht-jsml.tex' 33. `tex4ht-mathltx.tex' 34. `tex4ht-mathml.tex' 35. `tex4ht-mathplayer.tex' 36. `tex4ht-mkht.tex' 37. `tex4ht-moz.tex' 38. `tex4ht-oo-xtpipes.tex' 39. `tex4ht-ooffice.tex' 40. `tex4ht-ooimpress.tex' 41. `tex4ht-options.tex' 42. `tex4ht-sty.tex' 43. `tex4ht-svg.tex' 44. `tex4ht-t4ht.tex' 45. `tex4ht-tei.tex' 46. `tex4ht-unicode.tex' 47. `tex4ht-word.tex' 48. `tex4ht-xhtml-xtpipes.tex' 49. `tex4ht-xhtmml-xtpipes.tex' 50. `xtpipes.tex' 3.1 `tex4ht-4ht.tex' ==================== This is the (extremely large) literate source for all the `.4ht' files in the TeX4ht bundle. Run the following command to generate all `.4ht' files: ht tex tex4ht-4ht Nicholas Cole posted a bug report on the `texhax' mailing list regarding an undefined control sequence error of \blx@resetpuncthook and \blx@csq@ifkernmark. The reason was that these macros were not initialized. So, we added the following lines at the beginning of `\': \let\blx@resetpuncthook\@empty \let\blx@csq@ifkernmark\@empty Christoph Haug reported that \bib@field@entrykey creates an undefined control sequence error if `\printbibliography' is invoked. Another of with uninitialized macros, solved by adding: \let\bib@field@keyentry\@empty Also, Christoph said that there were a few spurious spaces after the opening parenthesis of year in an author-year citation and few other places. All were fixed. 3.2 `tex4ht-cpright.tex' ======================== The standard copyright statement was changed to the following: \<<< % % This work may be distributed and/or modified under the % conditions of the LaTeX Project Public License, either % version 1.3c of this license or (at your option) any % later version. The latest version of this license is in % http://www.latex-project.org/lppl.txt % and version 1.3c or later is part of all distributions % of LaTeX version 2005/12/01 or later. % % This work has the LPPL maintenance status "maintained". % % The Current Maintainer of this work % is the TeX4ht Project . % % If you modify this program, changing the % version identification would be appreciated. >>> Filename, author name and date are inserted at the top of this statement. 3.3 `tex4ht-dir.tex' ==================== Defines the path of your `tex4ht' package files. The default provided by Eitan was: \def\HOME{/home/4/gurari/tex4ht.dir/} \def\DTDS{/home/4/gurari/dtd.dir/} We switched these to use `.' instead of his hardcoded path. 3.4 `tex4ht-fonts-4ht.tex' ========================== This file generates all the `*.4hf'--hypertext font files--of the TeX4ht bundle. The file has 101806 lines! We had to increase TeX's memory and make new format for \latex to run this file. Here are the new values: `strings=494909' `pool_size=1180334 (string characters)' `main_memory=7999999 (words of memory)' `multiletter control sequences=15000+50000' Also, these needed values are the default in TeX Live 2009: `font_mem_size=3000000 (words of font info)' `hyph_size=8191 (hyphenation exceptions)' 3.5 `tex4ht-mkht.tex' ===================== CVR made significant changes on September 13, 2009: * All the backslash characters in the path names (conventional directory Separators under Windows) have been changed to forward slash. This is per the suggestion of Akira Kakuto, primary Windows developer for TeX Live. * `\version' has been redefined. * New functions, `\ScriptFileName' and `\AddExtn' have been defined to add file names of the script at the top of each script or batch file. These were not provided in the versions written by Eitan, but now needed for best license practices. * `\AddExtn' will add `.bat' if and only if the script is a batch file. * A new function `\' has been defined to add the usual copyright information (*note tex4ht-cpright.tex::) to each script when written out. * The `\Rem' macro used in `\' expands to the `#' character in Unix scripts and `Rem' in Windows batch files.