problem with Thai font

Werner LEMBERG werner@m17n.org
Sat, 29 Jul 2000 18:47:40 -0400



Dear font experts,


I'm trying to create an intelligent Thai encoding for TeX which uses
ligatures.  This is not a trivial task -- I've already written a file
`thai.enc' intended for use with afm2tfm which is 43kByte large and
contains 486 LIGKERN commands (either `|=:>' or `=:|>').

Unfortunately, afm2tfm fails.  It makes assumptions on the encoding
vectors which I haven't forseen, and which are finally fatal.

Older PS and TTF Thai fonts often use Adobe's Standard Encoding glyph
names for Thai glyphs so that the US version of Windows 3.1 (together
with US Adobe Type manager) accepts it.  Now we have the Adobe Glyph
List, and I have prepared the encoding file accordingly to use correct
glyph names.

  Example: /AE => /uni0E26 % THAI CHARACTER LU

It is written in the dvips info file:

    The Afm2tfm program creates the TFM and VF files for the virtual
    font corresponding to a PostScript font by "reencoding" the
    PostScript font.  Afm2tfm generates these files from two
    encodings: one for TeX and one for PostScript.  The TeX encoding
    is used to map character numbers to character names while the
    PostScript encoding is used to map each character name to a
    possibly different number.  In combination, you can get access to
    any character of a PostScript font at any position for TeX
    typesetting.

What I haven't known before is that the set of glyph names must be
identical for the TeX encoding and the PostScript encoding (maybe the
word `reencoding' implies that, but this statement is lacking).  This
assumption is probably useful for error checking, but in my case it is
very inconvenient since I have to accomodate the large set of LIGKERN
commands to the incorrect glyph names in the old font.  Since TeX only
uses glyph indices, and the PostScript encoding vector is completely
independent from the TeX encoding vector (which is only used for
writing the VF file), using two different glyph name sets doesn't
harm.

While replacing glyph names is doable (with e.g. a small script), it
has a fatal consequence.

Thai ligatures can be easily handled (in about 40 lines or so) if you
can group glyphs into classes with similar properties, and if you have
context patterns of length 3 -- I should use Omega :-).  As you know,
TeX only has patterns of length 2, and afm2tfm provides no direct
interface to TeX's grouping mechanism in ligature programs (which thus
increases the number of rules by a factor of approx. 10.).

Inspite of TeX's short patterns the problem can be solved by
introducing an additional `alias group', i.e., a group of glyphs which
are identical to the glyphs in the original group but have different
glyph indices.  With other words, I need that the PostScript encoding
vector puts glyph `/foo' twice at positions x and y, and my TeX
encoding vector then accesses glyph index x as `/bar' and glyph
index y as `/bar1'.  But due to the reasons explained above this
fails.

Is there any clean solution to this?  What about fontinst?  Does it
provide a similar restriction?  Or is the only solution to either hack
afm2tfm or to write the VPL file manually?


    Werner