[Fontinst] Re-encoded ligatures and searching

Lars Hellström Lars.Hellstrom at math.umu.se
Fri Oct 22 20:01:14 CEST 2004

At 17.36 +0200 04-10-22, Vladimir Volovich wrote:
>"LH" == Lars Hellstr–m writes:
>
> LH> 1. The nice mechanism for mapping glyphs to characters (for
> LH> searching) in a PDF file is via something called a ToUnicode
> LH> CMap. This can be part of any Font dictionary, but at least
> LH> pdfTeX doesn't ever seem to generate any (it certainly hasn't got
> LH> any source for the information).
>
>But LaTeX does have a source for the information (in the form of font
>encoding), and can issue the commands to pdftex to include the
>corresponding CMap file for each font.

That correspondence is not perfect, but indeed good enough to be useful.

>The cmap package on CTAN:macros/latex/contrib/cmap should be able to
>do this. If your document uses the T1 font encoding, then simply
>including the \usepackage{cmap} in the preamble before
>\usepackage[T1]{fontenc} should be able to solve this problem,
>i.e. the font should be associated with the CMap encoding and the PDF
>file should become searchable even if the font uses inconsistent glyph
>names.

Looking at cmap.sty, it seems this is achieved via TeX commands that add
explicit code to the PDF file in a suitable place. Apparently a trick with
many applications, although not (to my knowledge) very documented!

The cmap.sty internals (hooking into fontenc loading) strike me as rather
hideous -- this should rather be handled by code in \DeclareFontEncoding or
\DeclareFontFamily calls (like setting \hyphenchar is) -- but I suppose
that is practically impossible due to cmap not being part of standard
LaTeX. Another sign that NFSS2 is insufficient, since it is not open to
providing new types of information about fonts, I guess.

A more immediate problem is however the following quote from the cmap README:

>The main limitation currently is inability to work with virtual fonts,
>and this is because of limitation of pdftex, and may be resolved in a
>future versions of pdftex.

I presume the problem is that one can fiddle with the font when PDF font =
TeX font, but not when they (as in the case with virtual fonts) are
distinct.

Still, this is certainly interesting enough that I will probably have to
think about how one would implement an \etxtocmap command for fontinst.

Lars Hellström

More information about the fontinst mailing list