[Fontinst] Re-encoded ligatures and searching

Lars Hellström Lars.Hellstrom at math.umu.se
Thu Oct 21 01:13:19 CEST 2004

At 20.25 +0200 04-10-19, Jonathan Sprinkle wrote:
>I have been playing with fonts for some weeks now, and with tex/latex for
>about a year.
>I have a font created by Goudy for the University of California identify in
>the 1930s, and encoded as a TTF by Beatty designs in 1995. The trouble is,
>Beatty designs is a bunch of hacks when it comes to encoding the font.
>Many of the great ligatures (ffl, ffi, ff, ft, tt) are encoded as
>'integral', 'greaterthan', or other such ludicrous spots. My problem is
>this: that when I created my goudy.mtx file to reset the glyphs to be the
>ones I wanted, now if I search for "different" in a generated PDF file, the
>word is not found, because it is actually "di\integral\erent"

The glyph names used within fontinst cannot (for good or bad) have any
effect on searching in PDF files, but it is not unlikely that glyph names
in the font (as embedded in the PDF) are used to map glyphs to characters.
They are certainly far more reliable than the character codes.

>I should add that I do not execute the goudy.mtx when installing my raw
>fonts (the 8r encoding that I derive from the TTF-generated AFM file), but
>only when creating the t1 encoding.
>Should I perform my remapping of the ligatures in another place,

That probably won't make a difference.

>or will I
>be forced to modify the TTF in order to make this work?

In theory: no; but in practice: most likely yes. The correct information
can be incorporated into the PDF, but e.g. pdfTeX will most likely not do
so, and as a result the glyph name fallback is used.

Since I did some research on how it all works, I might as well report what
I found:

1. The nice mechanism for mapping glyphs to characters (for searching) in a
PDF file is via something called a ToUnicode CMap. This can be part of any
Font dictionary, but at least pdfTeX doesn't ever seem to generate any (it
certainly hasn't got any source for the information). Similarly I cannot
see how this information could be encoded in a PS file, so I wouldn't
expect any PS->PDF converter to include it either.

2. A similar mechanism is defined for TrueType fonts: the 'Zapf' table.
Again, I suspect most programs don't look for this and most fonts doesn't
contain any.

3. The _only_ place in a TrueType font where one finds glyph names (like
Adobe's more recent font formats, TrueType uses glyph indices internally)
is the 'post' table:

(In partial support of Beatty designs, I should remark that what they did
probably wasn't as hacky as it might seem when one looks at the AFM. They
probably simply didn't specify any names for their glyphs at all, and then
the font generator defaulted to using a version 1.0 'post' table -- this is
the only one which doesn't require one to specify a lot of glyph names, but
it also means that the glyph to name correspondance is the one of the
MacRoman encoding vector. It is only when transported to the PS/PDF domain
that the names are promoted to something important.)

So what you need to do is correct the 'post' table in the fonts. Apple does
provide tools that lets you edit this table (see link above), but these are
probably only available for Mac OS. I don't know what there may be for
other platforms.

Lars Hellström

More information about the fontinst mailing list