[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

the cork font encoding scheme for tex

I have several comments and questions about the extended TeX Latin
encoding scheme:

First, has anyone decided what the TFM coding scheme string should be
for it?  If not, I propose ``Extended TeX Latin''.

Also, I am in need of character names (a la PostScript) for the
characters in that font scheme.  Have those been specified?  Sebastian
Rahtz's article in Baskerville, Implementing the extended TeX layout
using PostScript fonts, uses some of them, but not all.  The characters
which were not in any of the Adobe coding schemes (and the names I
picked, since I had to choose something) are:
027  <cwm>		compoundwordmark
030  zero for per {million,thousand,...} zerolowered
032  the dotless j	dotlessj
0177 the hyphen char.	hyphenchar

and many of the accented and other non-English characters:
0200--0211, 0213--0221, 0223--0227, 0231, 0233--0236, 
0240--0251, 0253--0261, 0263--0267, 0271, 0273--0274,
0320, 0335--0337, 0360, 0375--0376.  (Their names are obvious, with the
possible exception of the capital es-zet, 0337, which I named

What is character 0211, the L with an apostrophe after it?  I'm
curious as to if a language uses such a construction, or if it's just an
unfinished rendering of something else.

Should 0264, the t-followed-by-apostophe, be t-with-a-caron-accent?
That is what its uppercase counterpart, 0224, is.  Or are these
apostrophes some variant of carons I don't know about?

I am wondering what the state of that encoding scheme ``standard'' is --
are further changes being contemplated, or is it frozen?  For example,
Rahtz's article mentioned above suggested that the visible space
character is inappropriate for a text font, a suggestion I agree with.
An invisible space character would perhaps be more useful.  I don't
understand why a visible space character is needed in a font at all --
can't it be created entirely from rules?  Also, why are the accented A's
(for example) in two different places?  I'm sure there is a good reason,
but I'm ignorant of it.

I am fully in favor of Pierre's suggestion that the accented characters
all have the height of the unaccented base; in fact, I don't see any
serious alternative, given the limitation in TFM heights.  (This
limitation might be a good thing to remove in the first ``non-TeX'' TeX,
if it can be done in an upward compatible way.)

This brings up another question -- don't we need to create and
distribute a set of macros and recommend standard ligature sequences for
the coding scheme?  If the TeX input isn't agreed on, some of the
advantage of a standard encoding scheme will be lost.  The biggest
question in my mind is how to take advantage of all the accented
characters.  I can imagine several alternatives:

1) change \', \`, and the like to produce a \char command if its
   argument is one of the grave-accented characters in the encoding, and
   an \accent command otherwise.  This would require the least change to
   existing input files (not an overriding consideration, in my view).

3) define ligatures in the font so that A + grave => Agrave (that is,
   0101 + 00 => 0192).  Then define a new control sequence \gravechar
   (e.g.), or perhaps redefine \grave to do \char outside of math mode,
   so that the user can type `\grave A' anywhere.  Then \grave should do
   \accent, if its argument isn't one of the grave-accented characters
   in the encoding.

3) make new control sequences \Agrave, \Ntilde, and the like to produce
   them.  This is the easiest thing to do, but the most painful to use.

Of course, all of this is unnecessary when keyboards can produce Agrave
or whatever directly.  The question is what to do when they can't.

One more question: are discussions of a math symbol coding scheme (and
others, such as a companion to the basic text font to provide many
missing symbols, like bullets and daggers and paragraphs) ongoing, and
if so, in what forum?  If it's open, I'd like to join.