[texhax] Problem with BaKoMa TrueType cmaps

Brian O'Toole Brian.O.Toole at mathworks.com
Fri Jul 20 03:03:58 CEST 2007

Hello All,

I am debugging some "Why do I get the wrong glyph?" bugs in our TeX/LaTeX 
system, and have traced one issue down to the following:

When the Knuth TeX engine creates the DVI file from my TEX file, the DVI 
file refers to character 0x14 with the font cmsy10.  When I check the cmsy10 
glyph table with TeXLive, I notice that character 0x14 is the "less than or 
equal" symbol, which is exactly what I want.  Because of OS problems with 
characters 0x00 to 0x1F, third-party font implementations usually place 
these characters after 0x7F in the character map, and DVI parsers should 
know about these offsets and can retrieve the correct glyphs.

When I ran ttfdump on the BaKoMa version of cmsy10.ttf, I found some 
interesting info regarding the 3.1 subtable.  Here is an excerpt:

Subtable  2.   Platform ID:   3
               Specific ID:   1
               'cmap' Offset: 0x0000011A
       ->Format: 4 : Segment mapping to delta values
  Length:  316
  Version: 0
  segCount: 4  (X2 = 8)
  searchRange: 8
  entrySelector: 2
  rangeShift: 0
  Seg   1 : St = 0020, En = 007E, Delta =    0, RangeOff =     8
  Seg   2 : St = 00A0, En = 00C4, Delta =    0, RangeOff =   196
  Seg   3 : St = 2219, En = 2219, Delta =    0, RangeOff =   268
  Seg   4 : St = FFFF, En = FFFF, Delta =    0, RangeOff =   268

It turns out that while many of the Knuth characters 0x00 to 0x1F have been 
shifted to Segment 2 beginning at 0xA0, Knuth character 0x14 has been moved 
to segment 3 and is at offset 0x2219.

Here are my questions:

1) Why is the "less than or equal" symbol moved to offset 0x2219?
2) Is there a way I could determine that I should use offset 0x2219 for that 
symbol without parsing the list of characters in the cmap to find the right 
one, i.e., is there something special about character 0x14 and offset 
0x2219?  What do I do in my DVI parser when I encounter Knuth character 
0x14?  Is there some algorithm in the TrueType spec that I may have missed?
3) Do other third-party TrueType implementations of cmsy10.ttf use the same 
offsets for the Knuth characters from 0x00 to 0x1F?
4) If the implementations are all fundamentally different, should I 
incorporate DVI parsers written by the authors of these TrueType 
implementions?  Must all such third-party parsers follow some accepted API?

Thanks in advance!
Brian O'Toole 

More information about the texhax mailing list