[Fontinst] A question about reglyphfont
Lars.Hellstrom at residenset.net
Thu Oct 8 12:59:37 CEST 2009
Pierre MacKay skrev:
> I am following this discussion with great interest, but I wonder whether
> the problems of using a font with the Adobe Expert Character set names
> have been looked at.
> Adobe seems (it is difficult to be sure of the causes) to have set up
> Acrobat reader 8 and 9 so that they trap names like Asmall . . .
> Zsmall, the old-style figures and the ff ligatures. Unless I use the
> on-line distiller at Acrobat.com, I get PDFs in which all characters
> from the Expert character set are replaced by blank space.
*What do you mean by* "are replaced by blank space", exactly? Don't
they show up when you view the document, are they missing when you
print the document, or are they missing when you copy text from the
> not all, because the accented glyphs in the range E0--FF come through.
> It is, of course, possible to bypass the problem by using something
> other than Reader 8 or 9.
/me typically uses Reader 5 (unless the document has compressed object
streams), because the GUI quality seems (IMHO) to be a decreasing
function of version number. ;-)
> Reader 6 and 7 did not have the problem, so
> it is something introduced by Adobe in the later versions of Reader. I
> submitted a bug report over the problem when Reader 8 came out. It was
> acknowledged, and I was told that it would be corrected "in the next
> major release." It clearly has not been corrected. One of the worst
> aspects of this bug is that it destroys the archival value of all PDFs
> distilled before the arrival of Reader 8. (I don't know exactly when
> the change was made in Acrobat Distiller, but I suspect that it was
> contemporaneous with Reader 8).
> A comparison of output from the online distiller at Adobe.com and output
> from Ghostscript 8.63 shows that in the Adobe distiller, any font with
> the names Asmall . . . Zsmall is treated to two consecutive
> operations, the first of which is associated with "/Tounicode." I have
> been unable to find out what /Tounicode does. Does it recode the entire
> Adobe Expert Character set into a page in the Private use sector?
If the difference involves /ToUnicode, then it should only be Copy text
and Search operations that misbehave, right? (IMO, that wouldn't
destroy the archival value of PDFs, but nor would bugs specific to one
FYI, the /ToUnicode entry in a PDF font dictionary sets up a mapping
from slots in the font to Unicode code points; the PDF1.5 spec
describes this in Section 5.9 "Extraction of Text Content". Providing
such a map explicitly is really the only general way to assign an
interpretation to the text in a PDF, but originally Acrobat Reader also
had heuristics for guessing an interpretation from the glyph names. It
is possible that the change in AR8 you observed was merely a retirement
of some of these heuristics, so that "Asmall" is no longer on the list
of known names, even though "a" might still be.
Fontinst has had the ability to generate /ToUnicode CMaps since v1.928
(or thereabout), through the \etxtocmap command. Getting PDF generators
to put it in at the right place is however not so straightforward;
pdfTeX only gives such access to font dictionaries from the TeX side
(whereas the mapfile would be more useful) and it only works for fonts
that have been \font'defed (hence not for base fonts of virtual fonts).
OTOH, recent pdfTeXes seem to have some built-in heuristics of their
own for generating ToUnicode data; I haven't studied those in detail.
Nor do I know what gs or dvipdfmx can currently do in this respect.
There is also the possibility of putting /ActualText data directly into
the page content stream by using pdf: \specials. I've recently
considered adding support for this to fontinst (the specials would be
embedded into the VF; I have figured out how to do it elegantly), but
that's probably only appropriate for faked glyphs (e.g. Euro from C and
two rules). See also the accsupp LaTeX package.
More information about the fontinst