[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

shorter names for TeX fonts (Re: Future of the `Karl Berry Font Naming Scheme'...)



Melissa O'Neill wrote:

> Lets get some points straight. More than 21% of the files on CTAN do
> not use 8+3 names. Some of the standard TeX/LaTeXfiles (e.g. lcircle10.tfm)
> are not 8+3 names. In the realm of fonts, 237 of the 560 EC fonts on CTAN
> do not fit into the 8+3 naming scheme.

Speaking as someone who sweated blood getting file names in the Malvern package
to fit into the 8.3 straitjacket, I think this is a shame, especially for
"standard" files like the LaTeX and EC ones.  The problem is that people using
modern operating systems forget unfortunates trapped on older ones and just
can't be bothered making their packages that can be used by them.

Font-mapping, if implemented sensibly, would make standardized 8-letter font
names a non-issue, if it weren't for TDS.  Perhaps TDS (the TeX file directory
standard) has been forgotten now, but when I was last involved in serious
TeX-hackery, the goal was to make a system which could be written on an
ISO-9660 CD-ROM and used as-is on almost any platform.  Hence 8-letter names
for all files in all packages were required.

The only excuse for the cryptic names used by the KB scheme was that it was
intended to fit into 8 characters.  It fails because of various things, one of
which is the reserving of letters to represent things that usefully could have
been combined with others.

For example, the foundry letter wastes 1/8 of the name on unimportant
information.  While it is true that Adobe Garamond and Stempel Garamond are
different, this is a feature only of Garamond and Caslon and a few others; for
most fonts (including all ITC fonts) versions from different companies should
be interchageable.  So we could combine three letters into a 3-letter family
code, increasing the number of possible families by almost a factor of 36.  In
cases like Garamond, different variations would have different codes.  For
fonts like ITC Avant Garde Gothic and its clones, there would be one code.

Similarly, the weight and width codes could have been combined in one letter;
this saves almost a whole letter on average (since occasionally the width is
omitted).

The encoding, while being obligatory, was omitted from the original definition
of Fontname, hence has a set of obscure two-letter suffixes, and occasionally
requires more than one of these.  Let's replace that wholesale with a
two-letter code at the end of every name.  That gives 1296 (36**2) possible
encoding codes, enough to include all the Chinese and Japanese fonts (in which
each row of has its own font, ergo its own encoding).  Then we ennumerate the
various options.  The result is that all the 9x and 8x suffixes are compressed
into one 2-letter code.  (I did actually make a start at such a registry, by
the way, including codes for all the fonts I could find on CTAN, even the
Klingon ones.)

Some variants, like small-caps, old-style-figures, etc., really define
variations on the encoding, not different styles of font.  Therefore we could
usefully eliminate these suffixes in favour of slightly increasing the possible
number of encodings.  All the variations on encodings like old-style figures,
small-caps etc., are given separate codes (there will be separate codes for
OT1, OT1+small caps, etc.).

On the subject of spurious variants, ones like "informal", "sans" and "serif"
aren't properly style variants but really families within a super-family.  That
is, Stone Sans and Stone Serif are two separate (but related) families, rather
than members of a single family.  (This interpretation is consistent with their
PostScript names, by the way.)  Similarly, we would interpret CMR and CMSS as
separate families within a CM super-family.  We can therefore eliminate these
suffixes in favour of a larger number of families.  Since we have room for
46656 families (36**3), we can afford to use three codes for the Stone families
instead of one.

Between these two we have eliminated most of the "style/variant" letters that
extend KB names for quite ordinary fonts to eleven letters or more.  But the
fixed-size portion of the name would be 3+1+2 = 6 characters, leaving only two
characters for the design size or more variant letters.  And we haven't
included slant yet.  Ouch.

Still, design size could just about be done in one letter if we don't mind
limiting our choices.  Here's a simple scheme.  Codes 5, ..., 9 represent 5pt,
..., 9pt.  Codes 0, ..., 4 represent 10pt, ..., 14pt.  Codes a, ..., y
represent 15pt, ..., 39pt.  Fonts with no design size use z.  (A more complex
scheme might allow codes for popular sizes like 48pt, 60pt, 72pt by doing
without all the values up to 40pt.)  Remember that design size is less often
varied these days; most people seem to prefer using magnification.

Stretching things even further, italic can be combined with width and weight if
we're willing to limit our pallette of widths or weights, which I imagine would
be controversial.  For example, we could interpret the width-weight-slant
letter as a number n from 0 to 35, where n = 18 * it + 6 * wd + wt, which
allows for 2 slant codes (upright and italic), 3 width codes (condensed,
normal, expanded) and 6 weights.

Anyway, the result would be far more cryptic than the KB scheme, but also
almost always within 8 letters.  It would still fail to cover all possible
fonts, but what the hey.  Naturally, font map files would be pretty much
obligatory.

It is possible to do even "better" than this and include every font ever, if we
are willing to give up generating short names algorithmically from the font
name and instead use a registry of font names.  That would work by having a
server running somewhere to which you send canonical long font names (not
specified here), and receive in reply a name in the sequence 00000000,
00000001, ..., 0000000a, ... (using base 36, this gives 2.82 x 10**12 possible
file names).  This database would grow over the years as more fonts are added
to its repertoire.  TeX documents would use the long font names, and
fontname.map files would in effect just be subsets of this database.  This
gives the maximum number of font names by eliminating all waste caused by
slicing off chunks of the name for different things, but of course the file
names are 100% useless to humans.

Neither of the above is advanced as a serious suggestion of a solution to the
font-naming problem.  They merely illustrate the lengths we would have to go to
to truly create an 8-character font-name standard.

-- Damian

Damian Cugley, damian@oxfordcc.co.uk