[texhax] TFMs for Unicode (WAS Re: Font help needed.)
pierre.mackay at comcast.net
Fri Apr 20 01:59:16 CEST 2007
Since some of the answers on this thread seem to connect with the
problem of providing TFMs for Unicoded fonts, I would like to float the
following proposed addition to the font naming scheme.
It is going to be increasingly necessary to provide sets of TFMs to
match large Unicoded sets of glyphs. For this purpose we ought to
borrow from Unicode organization and nomenclature. When a TFM is
created for one of the patgs above page U+00xx
let us simply say so, using the U (uppercase) and the hex page number
(it might be even better if we could use U+01, etc. but including an
arithmetic operator in a system-independent file-name is probably a bad
idea.) Unlike a lot of the gargantuan (and usually ephemeral)
featuritis that presently afflicts font technology, Unicode font pages
are not going to go away and, through the really brilliant coding of
UTF8, they will be accessible to those of us who refuse to be dragooned
into Vista for a long time to come.
If bchr designates the U+00 page of Bitstream Charter (I think it does,
I don't happen to have used Charter recently) then bchr*U01 will always
designate the Latin-1 set of glyphs and bchr*U03 will designate either
COMBINING diacriticals (which will never be of much direct interest to
TeX users owing to suboptimal spacing) or monotonic Greek or both.
Pages like U+03 leave an irreducible chance of ambiguity but, in
real-world usage it is unlikely to cause much trouble. At the worst,
some Unicode pages might have to be subdivided into slices (U030 vs U037
for instance, or U03a vs U03b)
In an earlier message I showed how UTF8 could easily be read with a
simple package of plain TeX macros. The intermediate output of this
package is one count register containing the page number, and one
containing the glyph number on that page.
The page number register can easily be put to use to generate the
appropriate TFM name by catenating it onto the old fontname No special
memorization of fontname categories will be needed, and the glyph number
will always be in the range 0--255.
Even CJK can be handled this way. The apparent size of a CJK repertory
is daunting, but in the few cases when I have had to consider CJK
setting, I have not been surprised to see that any real-world document
other than a dictionary is likely to use a manageable subset of the
whole range. The plain TeX macros mentioned above do not require
complete set, or even a continuous set of "subunits" to extract the
required Unicode pages. Only the ones that correspond with the group of
documents you happen to be setting need to be provided..
All of the above can be done with trip-tested TeX3.14195n, and it means
that we can retain the full power of DEK's TFM and VF programming
unchanged. I have the old-fashioned sense that that is really rather a
Karl Berry wrote:
> The problem is with fonts where the tfm name is not the
> same as the pfb name.
> This is, as you note, very common. That is how we handle the TeX 256
> chars per encoding limitation -- OT1, T1, texnansi, etc., each have to
> have their own tfm, even if they all map to a single pfb.
> Indeed, the Cyrillic font tfm's are generated by mktextfm. There are
> scripts in the lh distribution to do it all in batch if you're so
> inclined. Probably not very interesting for purposes of a font
> catalogue, in any case. (It could be interesting if each font had the
> supported encodings listed, though.)
> TeX FAQ: http://www.tex.ac.uk/faq
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org
More information about the texhax