[Fontinst] OpenType, otftotfm, and fontinst

Fri Dec 26 17:06:09 CET 2003

At 19.46 +0100 2003-12-22, Ulrich Dirr wrote:
>Hi.
>
>In the last weeks I was experimenting with OpenType integration into
>TeX. A great tool in this respect is otftotfm but sometimes I would
>like to control the whole process more transparently, and thus it
>would be great if I could use fontinst.
>
>One point is that OpenType fonts often differ in the amount of glyphs.
>On the other side I don't want to restrict the fonts for further
>processing by fontinst by using, e.g., 8a encoding as a starting
>point.

No, using 8a would be a shame.

>My idea is to have a kind of "generic", "big" encoding file which I
>can use for pre-processing by otftotfm (which ideally would recognize
>all glyphs) and feed the resulting PL file into fontinst for the next
>steps. Such a big PL file I could use for all 8r/8x/9e/9c, or
>ornament/oldstyle/swash/smallcaps variants, or whatever I want to use
>with TeX.

I suspect that most of the restrictions ahead of you comes from what
happens when you intend to print a DVI using the OTF font. If you're using
dvips then the DVI gets turned into Postscript, and PS is AFAIK rather
stuck in the 8-bit age (`character' equals `byte').

The normal way to handle fonts with more than 256 glyphs is to split it up
in subfonts, each with an encoding covering a different subset of the
glyphs available. What happens on the PS side is that the code from the PFB
(or whatever) file defines one font to the PS interpreter, under the
original name. Then for each entry in the psfonts.map file that involves
this font, dvips contributes code that define additional fonts with terse
names such as /F0, /F1, and so on. These additional fonts share the data
structures defining the glyphs with the original font, but have quite
different /Encoding vectors.

As usual, fontinst has no preferences towards any particular encoding and
does not mind combining glyphs from different fonts. If one has an AFM file
whatever8a.afm that _lists_ all glyphs in a font (even if most of them have
code -1, i.e., are unencoded), then the normal way of proceeding is produce
extra ETX files extras0.etx, extras1.etx, etc., each of which lists another
256 glyphs in some arbitrary order, and then go

  \transformfont{whatever8r}{\reencodefont{8r}{\fromafm{whatever8a}}}
  \transformfont{whatever5z}{\reencodefont{extras0}{\frommtx{whatever8a}}}
  \transformfont{whatever6z}{\reencodefont{extras1}{\frommtx{whatever8a}}}
  \transformfont{whatever7z}{\reencodefont{extras2}{\frommtx{whatever8a}}}
  \transformfont{whatever9z}{\reencodefont{extras3}{\frommtx{whatever8a}}}
  % And so on, until you've covered everything

The 5z, 6z, 7z, and 9z encoding variants are listed in the fontname scheme
variant.map file as ``user''; as I understand it, this means that they can
be defined for odd "user-defined" encodings such as these. Starting with 8r
as a first base encoding is not necessary, but it is probably useful to use
standard encodings to cover as many glyphs as possible (if nothing else, to
leave more "user" encodings for covering up the rest later).

Of course, the above is of little help unless one actually includes all
these glyphs in the glyph base. Thus one would typically do

  \installfont{whatever8t}{whatever8r,whatever5z,whatever6z,whatever7z,%...

for all the particular fonts you intend to make. Note that the map file
fragment writer will automatically generate ENC files for any
non-registered ETX used to reencode fonts. You can also generate them
manually using the \etxtoenc command. See ficonv.dtx for details.

The thing here that I am not familiar with is how one normally uses OTF
fonts in Postscript files. The above sort of assumes that they will be made
available as a (probably type 2) font with more than 256 glyphs, but that
is not the only possibility. I just did an experiment with a non-TeX PS
driver, and that instead made two heavily subsetted type 1 fonts for a
total of three glyphs from an OTF font. If the OTF is explicitly split up
into subfonts (by otftotfm or some companion tool) then these will probably
by default carry some encoding that makes the glyphs available and you
probably won't have to tell fontinst to reencode things further. Some kind
of encoding vector is still needed, but that is mostly a list of glyph
names and rather easy to piece together once one knows their names.

>What comes to mind instantly would be some kind of unicode encoding
>file. Does this exist?

There exists some unicode ETX files for fontinst, yes, but those are
probably not useful for base fonts. They're rather for making Omega fonts.

>I know that an inputenc file 'utf8' by
>Dominique Unruh exists but I'm not aware of any dvips enc file (which
>otftotfm needs) for utf8 or similar.

The TFM format is 8-bit only; it absolutely cannot handle more than 256
glyphs per font. The Omega OFM format does not have this restriction, but
this is probably not of any help in your case. There is also the
distinction between Unicode and UTF-8 to keep in mind. Unicode is a
standard assigning code points (numbers) to characters. UTF-8 is a standard
for encoding a string of Unicode characters as a sequence of bytes in such
a way that dumb software interpreting each byte as a separate character
won't make a total mess of things. UTF-8 is excellent for text files (such
as LaTeX input), but most formats for typeset text (such as DVI, which
interestingly enough support character codes up to 32 bits wide!) rather
tend to adapt the "Unicode" view that every character is one number.

Having said that though, it should be observed that Postscript and PDF do
have a concept of Type 0 font which can be used to support multibyte and
mixed-length encodings. The strings one passes to the PS show operator may
still only consist of bytes, but one can have that string interpreted as
e.g. UTF-8 data and thus access a larger glyph set in that way. With a Type
0 font whose CMap (which is what one uses instead of encoding) implements
UTF-8, one could go

  (\303\266) show

to print an odieresis glyph. The question is then whether we can make dvips
generate something equivalent to the above in its output.

At least naively, any sensible DVI-to-PS driver should generate something
equivalent to the above when confronted with the DVI/VF bytes

  128 195 128 182

or in VPL syntax

  (SETCHAR D 195)
  (SETCHAR D 182)

Can fontinst be told to generate such code? Indeed it can. The necessary
definitions could be something like

  \setrawglyph{195}{mbdummy}{10pt}{195}{0}{0}{0}{0}
  \setrawglyph{182}{mbdummy}{10pt}{182}{0}{0}{0}{0}
  \setglyph{odieresis}
     \glyph{195}{1000}
     \glyph{182}{1000}
     \resetwidth{486}
     \resetheight{577}
     \resetdepth{14}
  \endsetglyph

but this is more than a little silly. The `195' and `182' glyphs aren't
really glyphs, so they probably shouldn't be declared as such. For
experimentation the above may be acceptable, but for production one would
rather want it to be something like

  \setmultibyteglyph{odieresis}{somefont}{10pt}{\do{195}\do{182}}%
    {486}{577}{14}{0}

which would be different from \setrawglyph mainly in that it has a sequence
of slot bytes instead of a single slot number. It would require a couple of
new internal macros in fontinst (perhaps require a new companion of
\saved_raw), but that is no big deal. Fontinst as such requires no
reconstruction for this.

This leaves only two problems. One is that of getting the metrics: how does
one produce the MTX file containing those \setmultibyteglyph commands? From
an AFM or TFM file? Not very likely. I would rather suggest devicing some
kind of an otftomtx tool that does it all in one step. Whereas MTX files
may be tricky to interpret for anything not based on TeX, they are
certainly no more difficult to generate than AFM or TFM files (perhaps even
easier, since you don't have to specify at the start how many entries will
follow). There is already at least one program for making Type 1 fonts that
produces metrics in MTX format.

The other problem is that of whether dvips is dumb enough to do what we
want. It may try to be smart and assume that the advance widths of the
component bytes of a glyph actually correspond to real movements of the
current point in the Postscript interpreter, and therefore try to adjust to
these movements. The result would probably be hideous "drifts" of multibyte
glyphs. Maybe that can be worked around by surrounding all multibyte
sequences by a push--pop pair, but one cannot tell for sure until one has
tried it. (It would certainly be much safer if the DVI driver actually
understood what we were doing.)

Anyway this seems like an interesting project. I would certainly try to
help out with the fontinst side of it.

>Another benefit would be that one could use the fontinst 'standard'
>\xscalefont command for pdftex's font expansion feature (it's more
>economic to have only the base font in the mapping file and let pdftex
>choose the other). Because when using otftotfm directly for this I got
>a mess of font map entries like 'WarnockPOscBoldItalic-20--base
>WarnockPro-BoldIt "0.98 ExtendFont AutoEnc_elkn5hsf5csydavpi4ctzhxt3h
>ReEncodeFont" <[a_elkn5h.enc <WarnockPro-BoldIt.pfb', etc. pp.

Well, at least that shouldn't be a problem, although I don't think you will
be able to get away with fewer map file entries, since \xscalefont also
creates new base fonts.

Lars Hellström