[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

5+3+... FontName reduction algorithm: final report



Thanks to some detective work (or a fantastic memory) on the part of
Berthold Horn yesterday, the source of the Adobe 5+3+... fontname
reduction algorithm has been located in

	Adobe Technical Note #5040 (1992 March 31)
	Supporting Downloadable PostScript Language Fonts

available on the World-Wide Web at

	http://www.adobe.com/supportservice/devrelations/PDFS/TN/5040.Download_Fonts.pdf

The index to Adobe technical notes is:

	http://www.adobe.com/supportservice/devrelations/technotes.html

The relevant section of this 22-page report says:

>> ...
>> 6.3 Macintosh File Names
>>
>> In the Macintosh Environment, there is another naming scheme that is
>> constructed from the PostScript language font name. A file name is
>> constructed by dissecting the PostScript language font name into
>> components based on capital letters and hyphens.  Then the first five
>> letters of the first name component  are used, followed by the first
>> three letters of any subsequent name components.  The following
>> are examples to illustrate this concept:
>>
>> Palatino-Roman   => PalatRom
>> Palatino-BoldItalic  ==> PalatBolIta
>> Optima  ==> Optim
>> OptimaBoldOblique ==> OptimBoldObl
>> ...

I therefore prepared a minor modification of my previous program to
match on the FontName value, instead of the FullName value, and then
applied it to a combined list of unique FontName values from multiple
font vendors.  The test collection is larger than in my previous
experiments, because it now includes fonts from our TeX tree, and from
three low-cost font CD-ROMs (Font Locker, KeyFontsPlus, and Font
Elegance), where .afm files existed in these collections.

The program reported:

        42 collisions, 8165 font names, 8165 unique font names:

         3 cmss1                        [cmss10]
                                        [cmss12]
                                        [cmss17]
         2 Kaufm                        [Kaufman]
                                        [Kaufmann]
         2 cmmi1                        [cmmi10]
                                        [cmmi12]
         5 cmssi                        [cmssi10]
                                        [cmssi12]
                                        [cmssi17]
                                        [cmssi8]
                                        [cmssi9]
         5 cmmib                        [cmmib10]
                                        [cmmib6]
                                        [cmmib7]
                                        [cmmib8]
                                        [cmmib9]
         2 cmssq                        [cmssq8]
                                        [cmssqi8]
         2 cmti1                        [cmti10]
                                        [cmti12]
         2 LucidBriDemIta               [LucidaBright-DemiItalic]
                                        [LucidaBrightDemiItalic]
         2 BlackCha                     [BlackChance]
                                        [BlackChancery]
         2 BemboExp                     [Bembo-Expert]
                                        [BemboExpert]
         3 cmtex                        [cmtex10]
                                        [cmtex8]
                                        [cmtex9]
         2 PerpeExp                     [Perpetua-Exp]
                                        [PerpetuaExpert]
         5 cmbsy                        [cmbsy10]
                                        [cmbsy6]
                                        [cmbsy7]
                                        [cmbsy8]
                                        [cmbsy9]
         2 cmsl1                        [cmsl10]
                                        [cmsl12]
         2 cmbx1                        [cmbx10]
                                        [cmbx12]
         2 DfDivPla                     [DfDiversionsPlain]
                                        [DfDiversitiesPlain]
         2 GoudyTexMTLomCap             [GoudyTextMT-LombardicCapitals]
                                        [GoudyTextMT-LombardicCaps]
         3 MICR1BTReg                   [MICR10byBT-Regular]
                                        [MICR12byBT-Regular]
                                        [MICR13byBT-Regular]
         2 Coron                        [Corona]
                                        [Coronet]
         2 WitteFraMT                   [WittenbergerFraktMT]
                                        [WittenbergerFrakturMT]
         2 Pelic                        [Pelican]
                                        [Pelicent]
         2 Heide                        [Heidelbe]
                                        [Heidelstein]
         2 SassoPri                     [Sassoon-Primary]
                                        [SassoonPrimary]
         2 LucidBriIta                  [LucidaBright-Italic]
                                        [LucidaBrightItalic]
                                       [LucidaBrightItalic]
         3 cmcsc                        [cmcsc10]
                                        [cmcsc8]
                                        [cmcsc9]
         2 LucidSanTypBol               [LucidaSans-TypewriterBold]
                                        [LucidaSansTypBold]
         2 ScripPla                     [ScripteasePlain]
                                        [ScriptekPlain]
         2 cmtt1                        [cmtt10]
                                        [cmtt12]
         2 Harti                        [Harting]
                                        [Harting2]

Evidently, Computer Modern names, chosen a dozen years before Adobe's
algorithm, and indeed, years before PostScript, do not fit this
algorithm well.

The program also gave these top 10 longest abbreviations:

24 BodonAntTDemBolConItaIn1     [BodoniAntiquaT-DemiBoldCondensedItalicIn1]
24 BodonAntTDemBolConItaOu1     [BodoniAntiquaT-DemiBoldCondensedItalicOu1]
24 BodonAntTDemBolConItaRe1     [BodoniAntiquaT-DemiBoldCondensedItalicRe1]
24 BodonAntTDemBolConItaRo1     [BodoniAntiquaT-DemiBoldCondensedItalicRo1]
24 BodonAntTDemBolConItaSh1     [BodoniAntiquaT-DemiBoldCondensedItalicSh1]
24 FrankGotItcDBooExtComIn1     [FranklinGothicItcD-BookExtraCompressedIn1]
24 FrankGotItcDBooExtComOu1     [FranklinGothicItcD-BookExtraCompressedOu1]
24 FrankGotItcDBooExtComRe1     [FranklinGothicItcD-BookExtraCompressedRe1]
24 FrankGotItcDBooExtComRo1     [FranklinGothicItcD-BookExtraCompressedRo1]
24 FrankGotItcDBooExtComSh1     [FranklinGothicItcD-BookExtraCompressedSh1]

It is also worth noting the following paragraph from p. 269 of the red
Adobe PostScript Language Reference Manual, 2nd edition:

>> ...
>> 	* The font dictionary's FontName parameter, which is also usually
>> 	  used as the key passed to definefont, is a condensation of the
>> 	  FullName.  It is customary to remove spaces and to limit its
>> 	  length to less than 40 characters.  The resulting name should
>> 	  be unique.
>> ...

Since a PostScript name is limited in length to 127 characters
(p. 566), I don't believe that the value 40 above is anything more
than a recommendation for programming convenience.

Indeed, of the 8165 FontName values tested, 45 were longer than 40
characters:

        43 FranklinGothicItcT-MediumCondensedItalicIn1
        43 FranklinGothicItcT-MediumCondensedItalicOu1
        ...
        41 BodoniAntiquaT-DemiBoldCondensedItalicIn1
        ...
        41 FranklinGothicItcT-DemiCondensedItalicRo1
        41 FranklinGothicItcT-DemiCondensedItalicSh1

Some are quite short:

        ...
        5 cmu10
        4 Anna
        4 Cheq
        4 Hobo
        4 Lexi
        4 MICR
        4 MTEX
        4 MTMI
        4 MTSY
        4 Mira
        4 OCRA
        4 OCRB
        4 Stop
        4 Umpa
        4 cmr5
        4 cmr6
        4 cmr7
        4 cmr8
        4 cmr9
        3 Aja
        3 Bar
        3 Rad

For the record, here is the final awk program: it can be run on a
collection of .afm files, or any files with lines of the form

FontName ACaslon-AltBold

#=======================================================================
# Apply Adobe Technical Note 5040 5+3+ff algorithm to reduce FontName
# values to short names, and report collision statistics:
#
# Usage:
#	awk -f name53ff3.awk AFM-file(s)
#
# [08-Jan-1998]

/^FontName / {
    sub(/[\r ]+$/,"",$0)
    fontname = $2
    if (!(fontname in UniqueFontName))
    {
	UniqueFontName[fontname]++
	++UniqueFonts
	n = split(insert_hyphen_at_case_change(fontname),parts,/[-]/)
	name53ff = substr(parts[1],1,5)
	for (k = 2; k <= n; ++k)
	    name53ff = name53ff substr(parts[k],1,3)
	if (name53ff in used) collisions++
	used[name53ff]++
	FontName[name53ff] = \
	    (name53ff in FontName) ? \
		(FontName[name53ff] "\n\t\t\t\t[" fontname "]") : \
		("[" fontname "]")
    }
}

END {
    print (0 + collisions), "collisions,", FNR, "font names,", \
	UniqueFonts, "unique font names:\n"
    for (name53ff in used)
	  if (used[name53ff] > 1)
	      printf("%2d %-21s\t%s\n", \
		     used[name53ff], name53ff, FontName[name53ff])
    for (name53ff in FontName)
	  if (used[name53ff] == 1)
	      printf("%2d %-21s\t%s\n", \
		     length(name53ff), name53ff, FontName[name53ff]) | \
		"sort +0nr -1 +1 -2 | head -10"
}

function insert_hyphen_at_case_change(s)
{
    if (match(s,/[^A-Z][A-Z]/))
	s = substr(s,1,RSTART) "-" \
	    insert_hyphen_at_case_change(substr(s,RSTART+1))
    return (s)
}

----------------------------------------------------------------------------
- Nelson H. F. Beebe                  Tel: +1 801 581 5254                 -
- Center for Scientific Computing     FAX: +1 801 581 4148                 -
- University of Utah                  Internet e-mail: beebe@math.utah.edu -
- Department of Mathematics, 105 JWB                   beebe@acm.org       -
- 155 S 1400 E RM 233                                  beebe@ieee.org      -
- Salt Lake City, UT 84112-0090, USA  URL: http://www.math.utah.edu/~beebe - 
----------------------------------------------------------------------------