Journal home page
General information
Submit an item
Download style files
Copyright
Contact us
logo for The PracTeX Journal TUG logo

Opinion: Enduring LaTeX documents

Lance Carnes

From PCTeXWiki

LaTeX documents that endure

The title of this opinion piece may seem a little strange. After all, if I keep my document source files safely stored away, and have a LaTeX system to format them, they should always work, right? Well, sometimes. More often than not, though, a set of LaTeX files more than a few years old will probably not format the same today as they did in the original edition.

Some examples

A common occurrence is a book author who is having difficulty making a new edition. A few years ago the author used LaTeX to format his or her book, and spent a lot of time getting the correct page breaks and figure placements. Now, when it's time to revise the book for a second edition, the author uses the same source files but finds that in many places the book does not break pages or float figures the same way as the original edition. There may even be LaTeX errors that definitely weren't there before.

Another situation that can be confusing is with authors who are collaborating on a book. Sometimes they find that they cannot format their source files to get a consistent rendering of their book. The two authors use the same source files, but Author A formats the book and gets different page breaks than Author B. In some cases one of the authors may get an "Undefined control sequence" or other error while trying to format the files sent by the other author.

In the first case the difference is that time has elapsed between attempts to format a document, and in the second case two LaTeX systems are being used in different locations and most likely on different computer platforms. In the first case the author's LaTeX system was updated, and something changed which caused different page breaks and figure floats. In the second case, the LaTeX systems used by the collaborating authors are probably of the same vintage but something is causing the authors to get different results.

LaTeX documents that endure

Anyone who has used LaTeX for a few years has a collection of document source files. Many of these older documents will format successfully with an up-to-date LaTeX system, but there will be some documents that will not look the same as the original edition, or they will cause LaTeX error messages. It sometimes takes a lot of work to get these older documents to format successfully again.

If the older documents are not critical, such as letters, homework assignments, or other material that is not published, the fact that these may not format correctly is not a problem. But for books and articles and other published material that may be revised and used again, it would save a lot of time if they could be formatted identically to the original edition.

LaTeX systems that work the same

When two or more authors are collaborating it would be best if the various LaTeX systems used could format the exact same document from the same source files. Another form of collaboration is a LaTeX consultant working with a client — it would save a lot of time and expense if they each could reproduce the same exact document on their respective systems. But what often happens is that the LaTeX systems are different. For example, one person is using a Linux distribution and the other is using a PC distribution. The TeX programs that underlie the LaTeX systems on both platforms process files identically. However, formatting the same LaTeX source files on the the different platforms will often give different results.

Your ever-shifting LaTeX system

There are approximately 3,000 LaTeX system files. Some of these files are uniform throughout most LaTeX systems. The LaTeX "kernel" files and the common class files, such as article.sty, are identical on most LaTeX distributions. The files that are the cause of variability are the contributed class and style files. These files come into play when a document contains a \documentclass or \usepackage command. There are so many of these files and so many versions of each, that it is unlikely that any two LaTeX systems in the world contain the same set of LaTeX system files at any point in time.

Given all this possible variability in LaTeX system files, is it possible that documents can be formatted the same over time, and that LaTeX systems on different platforms can generate identical documents?

A current solution for enduring documents

One LaTeX system that can consistently format document editions is at Mathematical Science Publishers (http://mathscipub.com) at the University of California, Berkeley. Their journal issues and articles can be formatted identically to the original editions at any point in time. They accomplish this by using a version control system to guarantee that the original set of LaTeX system files used with a particular document is always used when formatting that document.

For further information on this system and its goals, see the Abstract, Long-time preservation strategies for TeX-sourced content, by Paulo Ney de Souza [1], and the video presentation [2]

Some other possibilities for enduring documents

One of the original goals of the TeX system is that given a valid TeX formatting program and a set of source files, the identical document can be formatted at any point in time and on any platform.

The problem is that the LaTeX system itself is a set of source files. When a LaTeX document is formatted, the TeX program first reads all the LaTeX system files and then inputs the author's source files. Since the TeX program is completely consistent, and the author controls his or her source files, the variable is the LaTeX system files.

One way to create an enduring document is to keep the current set of LaTeX system files with the author's source files. This guarantees that the set of input files is consistent over time. This is essentially how Mathematical Science Publishers is able to format documents consistently. Their version control system will pull the identical set of files from its archive each time a document is formatted.

Suppose there was a way that all LaTeX system files could be bundled together at a point in time, and kept along with the document source files. This would be one way to solve the problem of enduring documents. There might be some technical issues with organizing file system directories so that a LaTeX system could process the document. But if this problem could be solved it would provide a way to have enduring LaTeX documents.

One problem with this approach is that there would be a large cluster of files that could be used for one document only. This would work well for a single author working on a single computer system. However, if the author wanted to share this document with a collaborator, or send it to a publisher, there could be problems. The size of the file bundle would make this cumbersome, and it could be difficult for the person receiving the file bundle to get it all to work.

One way to reduce the size of the file cluster is to distribute the set of LaTeX system files to an online archive. Many current LaTeX systems fetch needed files from online archives. If the set of needed LaTeX system file versions could be kept in an index, a LaTeX system could use the index to fetch the needed files from an online archive. This would reduce the size of the file cluster since authors could exchange just their source document files along with an index of LaTeX system files.

In any case, it seems that the problem is one of ensuring that a unique set of files can be maintained so that a LaTeX document will format identically over time and across computer systems. Once this problem is solved, it should be much easier and less time-consuming to maintain a set of documents that will format consistently.

What are your thoughts?

A few colleagues who read this piece agreed that consistently reproducible LaTeX documents is something that should be available to every author. There are various ways this could be done. Give it some thought and then let us know your ideas.

Some suggested that the LaTeX and TeX community, even with its slightly inconsistent set of system files, is probably better off than other documentation users. For example, how are authors with 15-year-old MS Word, InDesign, and other files, able to format the same document again?


Page generated June 9, 2010 ;

TUG home page; webmaster; facebook; twitter; mastodon;   (via DuckDuckGo)