About boundary characters

Tomas Rokicki rokicki at gmail.com
Thu Sep 19 02:31:38 CEST 2019


A minor correction:  characters *can* have zero width.  A character
with a zero *index* to the width table does not exist, but one of the
other entries in the width table can certainly be zero.

(I don't know if any such characters exist, off-hand.)

-tom

On Wed, Sep 18, 2019 at 5:22 PM Doug McKenna <doug at mathemaesthetics.com>
wrote:

> Karl, Didier -
>
> My code faithfully duplicates DEK's algorithm, which his famous comment
> about "premature optimization" does not apply to, because his code for
> appending characters to the layout was turned into "post-matured"
> spaghetti.  I never tried to rewind its clock, so my C code is functionally
> the same pasta also, though it is easier to read IMNSHO.
>
> Looking at my comments and code, written a few years back during my Vulcan
> mind-meld with the WEB source, it seems that a boundary character is used
> to prevent ligatures and kerns from occurring when two or more adjacent
> characters are in different fonts.
>
> The thing is, there's only one boundary character per TFM font.
> Therefore, it kind of by definition has to serve as some kind of generic
> flag in multiple situations.  There's no express metrics stored for a
> boundary character per se in the TFM, but if it's a legal character code
> (between 0 and 255 for TFM), then presumably that character in the font can
> have metrics, usually of zero width, but not precluded from having non-zero
> width.
>
> Unfortunately, a character with zero width is formally considered missing
> from the TFM font, in order to save space by not storing some other bit
> somewhere in the font data that would declare a character code between |bc|
> and |ec| as missing (see the char_exists() macro in the WEB source; it
> tests for positive width).
>
> Because of that little non-orthogonal problem, there's the TFM font's
> so-called "false" boundary character, which is synthesized when the TFM
> file is read in.  The false boundary character is the boundary character,
> unless the boundary character's width is non-zero, in which case the false
> boundary character is set to a not-a-character value.  The comment in WEB
> source says it's to prevent "spurious ligatures".  This smells like a hack
> to me, but perhaps it's elegant.  Again, the problem being solved (I think)
> is how to introduce a character of zero width into the layout to break a
> kern or ligature, rather than having it flagged during input as missing
> from the font before any attempt to append.
>
> DEK uses the phrase "pseudo-ligatures" in a comment, but he never defines
> the term, and the phrase is not used anywhere else in the TeX code that I
> can find, so that's not much help.
>
> Anyway, FWIW after a quick flyby of the code.  Because of the complicated
> nature of the ligature stack and the ligature/kern "program" in the TFM
> file, I'm probably not explaining stuff going on there very well.  Indeed,
> the above may be quite wrong.
>
> It seems post-mature optimization is kind of evil too. :-)
>
>
> Doug McKenna
>
>
>
> ----- Original Message -----
> From: "Karl Berry" <karl at freefriends.org>
> To: "Didier Verna" <didier at didierverna.net>
> Cc: "texhax" <texhax at tug.org>
> Sent: Wednesday, September 18, 2019 4:42:24 PM
> Subject: Re: About boundary characters
>
> Hi Didier,
>
>     I have several questions about boundary characters in the TFM format.
>
> I surmise experimentation is necessary. The "specifications", such as
> they are, are insufficient, so far as I can tell. (Since they were added
> in the 1989 update, Don had only a tiny amount of space in which to
> describe them.)
>
> It's never been clear to me what TeX actually does with boundary
> characters (so maybe their metrics do not matter?). I believe that they
> are only relevant in the ligkern table, but that's about all I know.
> I read the descriptions in the {mf,tex}{book,.web}, as I suppose you
> have also, but clarity is not forthcoming. As far as I know there is no
> other significant source of information.
>
> Doug, I surmise you may have more knowledge than anyone? But maybe your
> re-implementation was too long ago now :).
>
> It would be nice to have a thorough article for TUGboat on boundary
> characters.
>
> As for what existing fonts may or may not do with them, (1) it's hard to
> say anything without knowing what fonts you are talking about, and (2) I
> wouldn't take it too seriously. Maybe the font creators did lots of
> experiments and created boundary chars the way they did for specific
> reason, but IMHO it's equally likely that they simply followed some
> examples, tried to do what they thought made sense, and whatever
> happened, happened. --best, karl.
>


-- 
--  http://cube20.org/  --  http://golly.sf.net/  --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20190918/e607b48c/attachment.html>


More information about the texhax mailing list