[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

T1, T2, T3, ... ==> EM

A TeX Atomic Character Encoding for VF and DVI Standardization

    Peter Dalgaard recently mentioned the old idea of creating TeX
*virtual* fonts realizing the 255 character Cork norm in Knuth's CM
typeface (see LaTeX list March 13 1994).  He calls this VDC since the
one full realization of the Cork norm is the DC font set of N.
Schwartz (which is via Metafont and non-virtual).

    This idea seems never to have been taken seriously --- probably
because a goodly number of characters of the Cork norm cannot be
generated from the CM fonts via Knuth's virtual font mechanism.
However I do believe that VDC --- the creation of *virtual* fonts
fully implementing the Cork text font encoding is the right way to
move forward quickly and efficiently on the basis of existing

     This however requires creation of a new *atomic character
encoding standard*.  An atomic 256 character 8 bit encoding standard
has room for enough atoms to serve *all* European languages using a
Latin alphabet and even most such languages worldwide. Thus its scope
is wider that that of the Cork norm (which omits Welsh, Catalan,
Estonian,..., many African languages, etc.), and its role is more

      An atomic glyph (character) is one that cannot be usefully
assembled by virtual font methods from other characters. There will be
some quibbling: is the letter i atomic or is it composite (molecular)?
I believe we would not be far wrong to allow all CM glyphs "atomic"
      It does not, in principle, seem to matter what the atomic
encoding is. But humans would like to use the font table to easily
memorize the glyphs available. And perhaps the idea of making the
atomic font useful on its own is a sound idea.  Leaving in place the
atomic characters of the Cork norm in the segment 0--127 seems the
best way to start.  The 128-255 Corl segment contributes about 20
atoms. About 20 further atomic glyphs are found in J\"org
Knappen's article "TeX and Africa" (Cahiers Gutenberg 10--11,
September 1991, 15--24).
      I expect that further essential atoms will gradually be
unearthed; hence the idea of filling up the full 0--255 segment (if
necessary with trash) would be a fatal error.  Every 10 years or so
TUG might fill a few more empty slots. The total number filled might
initially be closer to 128 than to 255, which would be a great
blessing to weaker computers over the next decade.
      Following CM designs offered by N. Schwartz, Jackowski&Rycko, J.
Knappen, J. Zlatushka,  and others, we should then realize in Metafont
a Computer modern typeface with the atomic character encoding (or
perhaps just a first part of it).  This is initially a matter of
disengagement and reinstallation of existing atomic characters.

      This would promote creation of standard virtual fonts and
encodings T2, T3, ... that can be based on this norm.

      Then the old dream of having a more compact virtual version of
the DC fonts could be realized quickly.  And hopefully in the Adobe
Type1 format for the first time.  There are other Metafont typefaces
to recast in atomic form, for example Pandora, Concrete Computer
Modern, Malvern. Hitherto font makers have apparently been pushed to
realize the Cork norm in bulky metafont form :

  > I did this reluctantly and with gritted teeth, and
  > every time I am told that I *must* do this because the
  > wise men of Cork say so I am *sorely tempted* to rip
  > out my Cork-compatibility code and refuse to include
  > Cork characters in the Malvern glyph set at all.
                 Damian Cugley, creator of Malvern
                 on metafont list 15 Mar 1993
                 (archived on ftp ftp.univ-rennes1.fr)

Font creators certainly deserve to be freed of font packaging problems.
Perhaps the atomic norm would simplify their lives?  But for that to
be a reality the norm would have to be "statified" or "modularized"
into more and less essential parts; and conformity defined more as
non-contradiction than as complete adherence.

      Here is an example where modularity seems necessary. Some
typefaces use distinct accent shapes for lowercase and uppercase
letters, usually to prevent uppercase letters becoming excessively
tall --- as they indeed are to my eye in Knuth's CM typeface. It
seems therefore wise for the standard to allow uppercase and lowercase
versions of all accents; but to impose the two on a designer who
disapproves would be folly.  (In DC, Norbert Schwartz uses low profile
accents for both lowercase and uppercase.)

      A further highly desirable goal would be achieved by this atomic
encoding standard, namely worldwide TeX .dvi compatibility for prose
in Latin languages, at least where public domain and freeware TeX
fonts are concerned. This goal was for me the chief motivation for the
present proposal. This could be extended to science by the encoding
norms of Barbara Beeton's math fonts committee plus their future
realization in CM typeface.  A durable TeX standard for highly
efficient archiving of scientific literature would thus be created.
The underlying mechanism for archiving is as follows : use Peter
Breitenlohner's  "dvicopy" utility to "expand" the ".dvi" output file
built by TeX using any virtual font based on the atomic standard; this
produces .dvi files for archiving based entirely on the atomic
standard. These .dvi files can be exploited wherever the standard
atomic fonts are installed. OzTeX has recently introduced a remarkably
fast and smoothly integrated version of "dvicopy", which is hopefully
becoming a standard tool. Note that the archive user does not need
"dvicopy"; strictly speaking, only the archive manager does.

     The archiving of scientific literature using ".dvi" files of
Knuth's CM norm is already a growing phenomenon. Is the above proposed
norm in competition with CM --- which incidentally is atomic?  One
hopes not, for the dominance of English would be crushing! Fortunately
the two norms can *coexist* because the CM fonts can be faithfully
recreated as virtual fonts based on the atomic norm proposed.

     The possibility of full and efficient emulation of Knuth's CM by
the new font system should be a fundamental design requirement; it
provides the essential backwards compatibility necessary for any
archiving venture and for conservation of Knuth's monumental
contributions.  It also allows small TeX systems to use the new
standard without strain. And it will hopefully induce cultured
Anglophones to to adopt it too.

      In archiving of publications in .dvi form, there would indeed be
competition between the Cork norm fonts and the the atomic norm I am
proposing.  I expect the atomic norm to win because it far better
allows for numerous national typographic variants: umlauts in German
riding lower than the similar diaise accents in French; acute accents
in Czech/Slovak positioned further left than for French; side-bearings
for punctuation that are national ideosyncracies (French, Spanish)
etc. To be honest, Cork norm archiving (using DC fonts for example in
place of the proposed standard atomic font) might be winner on some
fronts. Where Polish is concerned, updated DC fonts might fare better,
since that allows optimal rendition of Polish accents that touch their
character and slightly change shape to fit various characters. It is
not clear that "enough" variants of the Polish ogonek and cross accents
can be included in an atomic norm; but I predict a happy compromise
could be found --- perhaps with just the expected lower and uppercase
versions!  At the same time, I doubt either competitor could, with a
single typeface, merit typographic awards in all countries concerned!
A degree of compromise is involved.

      In spite of the above competition, the potential role of the
Cork norm remains undiminished in the *composition* of multilingual
TeX documents and in reducing conflict between TeX macro packages used

     Note that once one has installed the standard atomic CM face I
envisage, one is equipped to *read* and *print* .dvi files in hundreds
of  Latin based alphabets; and indeed with the old non-virtual
drivers. But *composition* of a .dvi file in even one single language
(say English) in principle requires language specific adaptions:
normally virtual fonts (say emulating Knuth's CM or Schwartz' DC fonts
for English), hyphenation tables, and a language specific macro
package. I am not sure that the idea of serving English on an equal
footing with Xhosha is going to melt ice in high places, but let me
admit that it pleases me well, and certainly more than the current
doctrine centered on DC fonts, that inconveniences everyone.

     It will be tempting to make the atomic CM font series usable
alone for English (without virtual fonts) by including suitable
kerning and ligature information --- that is a perfectly reasonable
tribute to TeX's mother language.  If the atomic encoding happens to
be Cork's (and not CM) on the 1--127 segment, Cork norm militants
will be consoled to have Anglophones dancing to a Continental tune.

     Where do the burgeoning "outline fonts" notably the huge
population of Adobe Type1 fonts fit into this picture? (Adobe's fonts
and the competing Truetype are crowding out TeX's bitmapped fonts
generated by Metafont, by reason of their high quality scalable screen
rendition and greater compactness at high resolutions.  My impression
is that the Adobe font notions (born several years after TeX's) are
strictly more powerful than TeX's, except for the key "meta" features
of Metafont. Notably, a Type1 font is basically a collection of as
many glyphs as one pleases each designated by a unique name (rather
than by a number in the segment 0--255); encodings are variable
secondary structures that one specifies by various encoding vectors.
Further, the level 2 Postscript of printers currently being installed
supports "composite" type 1 fonts that play roughly the role of TeX's
virtual fonts. The upshot is double: on the one hand the small but
highly influential commercial font development activity for TeX is
likely to be increasingly based on Adobe norms, but on the other hand,
the fonts produced will adapt readily to norms elaborated for TeX's
own font system.

     This is the appropriate point to mention that, on the Metafont
list and TeXHax, 1 September 1993, Pierre MacKay recommended the
creation of an atomic (or "raw" or "simple") font encoding for Adobe
Type1 fonts. Thereafter Bert Horn (11 Sep 1993 TeXhax only) pointed
out that Adobe's encoding vertors etc. are more powerful and in
principle simpler. I am reviving the issue in a new form, by arguing
that several essential applications of the basic idea lie *outside*
the radius of action of Adobe fonts and justify renewed public

     Applications to Adobe fonts will not be entirely negligible!
Given that Adobe's unique character-naming approach makes encoding a
subsidary matter, the handful of new atoms needed to construct the
characters of the Cork norm could usefully be added *immediately* to
the character sets of the existing Adobe Type1 realizations of Knuth's
CM fonts. The urgent task is conversion of existing atomic Metafont
glyph designs for Cork norm characters to Adobe Type1 format; standard
encodings can be installed at any later point. There is an undeniable
opportunity here for type 1 font makers to take the lead without
risking lost labor.

     Hopefully, TUG too will perceive here an important opportunity
for leadership in fashioning an atomic norm.
     Laurent Siebenmann <lcs@matups.matups.fr>

PS.  This note has been short on specifics concerning glyphs and
encodings. To compensate, the interested reader might start by looking
up the references and searching for the word "Cork" and "encoding" on
various TeX discussion list archives.

PS.  John Plaice, in his article "Language Dependant Ligatures" in
TUGboat 14(1993), 271--274, argues for the use of atomic characters
with a new more powerful version of TeX; but he rejects the use of
virtual fonts as leading to Brobdingnangian inefficiency; and he does
not contemplate the possibility of regulating them with an atomic
character norm. Brobdingnangian multiplication of virtual fonts sounds
frightening --- but if it turns out to be a real menace, even with a
norm,  I am confident it can be kept in check by new virtual font
tricks, cf. my comments, page 219 ibid.