-
  logonavy


The LaTeX Tagging Project


About Tagging

As a retired professor from the University of Oregon, I find the tasks required of current faculty almost impossible to imagine, from teaching remotely to handling large classes to mentoring graduate students to finding time for research to worrying about tagging documents for the web. The information in this section is particularly scary because it seems to require that all documents given to students and available on the internet must by law satisfy a "tagging standard", which however is murky and often incomprehensible. On the other hand, all of us know of colleagues with disabilities who manage to thrive, and would be happy to provide web documents for these colleagues if the process is easy. So while this first section is disturbing, the news for mathematicians in later sections is much happier.

According to Google AI, PDF tagging creates a hidden, structured, HTML-like tree within a PDF file to define content hierarchy (headings, paragraphs, tables, images) for accessibility and screen readers. Essential for Section 508 compliance, tags ensure proper reading order and allow assistive technology to navigate documents.

To put it more directly, a tagged document can be read to blind users by appropriate software programs, and used by others with various disabilities. Universities and governments are starting to require that all pdf documents on the web be tagged.

Google AI recommends using the "Autotag Document" tool in Adobe Acrobat Pro for quick, foundational, though often imperfect structures, and adds that "Manual Tagging" using the Tags pane and Reading Order tool in Adobe Acrobat Pro can help define heading levels and add alternative text for images. Ultimately, it states, manual review is crucial for 100 percent accuracy.''

However, we do not intend to use Adobe Acrobat Pro. Instead we'll tag with LaTeX, which works exactly the same on Macs, Windows, Linux, and other machines.

History and the LaTeX Team

The tagging project in mathematics is under the direction of the LaTeX Team, and it pays to know how that team came to be.

The input language for Donald Knuth's original TeX program was extremely primitive, but Knuth added a macro facility which allowed users to create more powerful commands by stringing together these primitive inputs. In particular, Knuth wrote a set of macros for his books on The Art of Computer Programming called "Plain TeX". When a TeX user claims to typeset using ordinary TeX, they are using the Plain TeX macros.

Later Leslie Lamport wrote a more comprehensive set of macros called "LaTeX". So LaTeX is just ordinary TeX with Lamport's macros rather than Knuth's. Version 2.09 of LaTeX was released in 1985. Still later, Michael Spivak wrote a set of macros for mathematicians called "AMSTeX". There arose a desire to combine LaTeX and AMSTeX into a single set of macros, but that proved to be impossible due to certain limitations in LaTeX.

By that time a young Frank Mittelbach had come from Germany to Stanford to work on TeX, and at a TUG meeting in 1989 at Stanford, Lamport turned over development of LaTeX to him. Mittelbach then formed a very small team to develop it further. This team released LaTeX2e in 1994, and that version made it possible to combine LaTeX and AMSTeX into a single system.

So LaTeX has been under the development of Mittelbach and a small but varying team for the last 35 years.

References

The status of the LaTeX tagging project is approximately as follows. The key additions to the kernel are done. Now it is necessary to look at all important packages and modify them so they support tagging. Many important packages have already been analyzed, but a long list of additional packages must still be modified.

Below are three recent places to learn about the LaTeX tagging project. The first is in the most recent issue of Tugboat, the journal of TUG. This is volume 46 (2025), No 3, where a long article titled LaTeX News fills pages 347 - 352. A link to this particular journal is at https://tug.org/TUGboat/Contents/contents46-3.html and the article is pretty far down this page in the section with black heading named "LaTeX", with article titled "LaTeX News".

Another useful source is a video talk given by Mittelbach at the TUG conference last summer. Videos of the talks are at https://tug.org/tug2025/ Unfortunately, these videos contain the full conference rather than individual talks only, so a little searching is needed to find the video you should watch,

Find the videos for Day 1, Friday, Part 1. In this video, go to 1:39:00, which is the start of Frank Mittelbach's talk. In this talk, Mittelbach shows us a piece of pdf containing standard mathematical elements. Then he asks us to close our eyes and listen to this talk read from an untagged pdf, and then from a pdf tagged by Adobe Acrobat, and finally from a pdf tagged by the LaTeX project. The first two examples are awful; for instance, matrix entries are often read in random order. But the LaTeX project samples are much better. Mittelbach explains briefly how the system works, and why its results are better.

The previous video is highly recommended. Also highly recommended is a more structured talk given by Mittelbach at PDF Days Europe 2025. See https://pdfa.org/presentation/tagged-and-accessible-pdf-with-latex-revisited/. At the bottom of the page linked here, the right hand side shows a video covered by a large red arrow. Click that arrow.

Adding Tagging to a LaTeX Document

Finally, we come to the heart of the matter. Tagging produced by the LaTeX project has the following goals:

A document to be tagged can be processed by pdflatex or by lualatex, but not by xelatex. That is because xelatex does not produce output in a form required by the tagging project.

To add tagging to a latex document, it is only necessary to add one line to the source. This line must be at the very beginning of the source, even before \documentclass. The required line is

     \DocumentMetadata{tagging=on}

Typesetting A Tagged Document

Now typeset. In the console, extra warning messages from illustrations may appear, but typesetting will proceed as usual.

Shortly before typesetting ends, LaTeX constructs the extra hidden, html-like tree which will be added to the pdf file. During this process there are no error messages in the console and typesetting appears to be briefly stuck. Just wait.

After that, final messages appear in the console and the revised preview file is shown.

Illustrations

Earlier we stated that tagging can be done with no human intervention. That is not quite true, although many users will probably skip the additional step that is technically required. The problem comes from illustrations.

Below is typical code for an illustration from a LaTeX document. The illustration can be in several formats: pdf, png, jpg, eps, etc. The surrounding code often looks like this:

   \begin{figure}[htbp] %  figure placement: here, top, bottom, or page
      \centering
      \includegraphics[width=3in]{diagram1.pdf} 
      \caption{Space Time Diagram}
      \label{fig:example}
   \end{figure}

The key line for us is "caption", which adds a name to the illustration. Many authors omit this entirely, so illustrations stand alone with no caption. If an illustration in a tagged document has a caption, then that caption is spoken when the document is read aloud, but as we will see, that situation is not ideal. If an illustration has no caption, then the tagging project speaks it by just giving the name of the illustration, perhaps "diagram1 pdf". This caption is not printed in the visual document; it is just spoken as a substitute for a bad situation. In this case, the tagging software will flag the illustration with a warning.

But the ideal situation is to add another element to the illustration code of the form

     alt={This is alternative text},
This element can be added to the previous code fragment in addition to "\caption" or even when there is no "\caption". It is not shown in the visual document, but instead is spoken when the entire document is read to a blind user. Harvard's "Digital Accessibility" page says

All of the sources warn that long alt phrases can annoy a user listening to the document. So I add a personal piece of advice, to be used temporarily as we get used to tagging. If you do not know a disabled user of your web pages, ignore alt in illustrations and just accept the default. If you go to the trouble of adding alt text, you are probably doing it wrong, and you don't have a user who will complain. But if you run into a student who is actively using the tagging, then use that opportunity to add alt phrases and find out from that user what works and what doesn't.



Happy TeXing on macOS !

Donate