GNU font utilities

Short Contents

Table of Contents


Next: , Up: (dir)

GNU font utilities

This manual documents how to install and run the GNU font utilities. It corresponds to version REPLACE-WITH-VERSION (released in REPLACE-WITH-MONTH-YEAR).

The introduction briefly describes the purpose and philosophy of the font utilities. The overview gives details on their general usage, especially how they interact, and describes various things which are common to all or most of the programs.

The first part of this master menu lists the major nodes in this Info document, including the index. The rest of the menu lists all the lower level nodes in the document.

--- The Detailed Node Listing ---

Installation

Prerequisites

Overview

Creating fonts

Command-line options

Specifying character codes

Bugs

Bug reporting

File formats

Encoding files

Imageto

Imageto usage

IMGrotate

IMGrotate usage

Fontconvert

Invoking Fontconvert

Charspace

CMI files

fontdimen command

Limn

Limn algorithm

Fitting the bitmap curve

BZRto

Metafont and BZRto

CCC files

BZR files

BZR characters

BPLtoBZR

BPL files

BPL characters

XBfe

XBfe usage

XBfe shape editing

BZRedit

BZRedit usage

Editing BPL files

GSrenderfont

GSrenderfont usage

Enhancements


Next: , Previous: Top, Up: Top

1 Introduction

This manual corresponds to version REPLACE-WITH-VERSION of the GNU font utilities.

You can manipulate fonts in various ways using the utilities: conversion of a scanned image to a bitmap font, hand-editing of bitmaps, conversion of a bitmap font to an outline font, and more. More generally, you can start with a scanned image of artwork and work your way through to a finished font with side bearings, accented characters, ligatures, and so on.

The font formats recognized by these programs are primarily those used by the (freely available) TeX typesetting system developed by Donald E. Knuth from 1977–1990. The filenames, font searching, and other aspects of their usage are also based on TeX. They also support output of PostScript Type 1 fonts.

Some of this software was originally written as part of the research program in digital typography at the University of Massachusetts at Boston, directed by Robert A. Morris. The staff at UMB, Rick Martin in particular, has been kind enough to let us to continue to use their computers, despite our completing the Master's program there in 1989.


Next: , Previous: Introduction, Up: Top

2 Installation

See Prereqs, for what you need to have installed before you can compile these programs.

After that, here's what to do:

  1. Run sh configure in the top-level directory. This tries to figure out system dependencies and the installation prefix. See The configure script (Kpathsearch library), for options and other information about configure.

  2. If necessary, edit the paths or other definitions in the top-level GNUmakefile and in include/c-auto.h.
  3. Run GNU make. For example, if it's installed as make, just type `make' in the top-level directory. If all goes well, this will compile all the programs.
  4. Install the programs and supporting data files with make install.

If you encounter problems anywhere along the line, let us know. Known problems are listed below (see Problems). See Bugs, for details on how to submit a useful bug report.


Next: , Up: Installation

2.1 Prerequisites

To compile and use these programs, the following are necessary:

See the section below for information on how to get all these programs.


Up: Prereqs

2.1.1 Archives

The canonical source for all GNU software, including the GNU C compiler, GNU make, and Ghostscript, is prep.ai.mit.edu:pub/gnu. That directory is replicated at many other sites around the world, including:

United States:
          wuarchive.wustl.edu, ftp.cs.widener.edu,
            uxc.cso.uiuc.edu
            gatekeeper.dec.com:/pub/GNU
     

Europe:
          src.doc.ic.ac.uk:/gnu, ftp.informatik.tu-muenchen.de,
            ftp.informatik.rwth-aachen.de:/pub/gnu
            ugle.unit.no
            ftp.denet.dk
            archive.eu.net
     

Australia:
          archie.oz.au:/gnu (`archie.oz' or `archie.oz.au' for ACSnet)
     

Asia
          ftp.cs.titech.ac.jp, utsun.s.u-tokyo.ac.jp:/ftpsync/prep,
            cair.kaist.ac.kr:/pub/gnu
     

You can also order tapes with GNU software from the Free Software Foundation (thereby supporting the development of the font utilities and the rest of the GNU project); send mail to `gnu@prep.ai.mit.edu' for the latest prices and ordering information, or retrieve the file DISTRIB from a GNU archive.

The canonical source for the X window system is export.lcs.mit.edu:pub/R5. That directory is also shadowed at many other sites, including `gatekeeper.dec.com'. The FSF also sells X distribution tapes.

TeX is more scattered. A complete Unix TeX distribution is available by ordering a tape from the University of Washington (send email to `elisabet@u.washington.edu'. Three archives with complete (and identical) TeX collections:

     ftp.uni-stuttgart.de:/soft/tex
     ftp.tex.ac.uk:/pub/archive
     pip.shsu.edu:/tex-archive

The canonical sources for just Web2C—the port of just TeX, Metafont, and friends to Unix, without DVI processors, fonts, macro packages, etc.—are:

     ftp.cs.umb.edu:pub/tex/                  (Boston)
     ics.uci.edu:TeX/                         (California)
     ftp.th-darmstadt.de:pub/tex/src/web2c/   (Germany)

At all these sites, the files to retrieve are web.tar.Z and web2c.tar.Z.

The DVI-to-PostScript driver we recommend is Tom Rokicki's Dvips, and the X window system driver we recommend is Paul Vojta's XDvi. These programs are available from, respectively,

     labrea.stanford.edu:pub/dvips*
     export.lcs.mit.edu:contrib/xdvi.tar.Z

We have modified XDvi and Dvips to use the same path searching code as the current distribution of TeX and these font utilities; the modified versions are available from ftp.cs.umb.edu:pub/tex.

To use Metafont, you must have a file defining output devices. (See Metafont and BZRto.) We recommend you obtain modes.mf from

     ftp.cs.umb.edu:pub/tex/modes.mf

You can retrieve the document describing all the details of the naming scheme for TeX fonts from

     ftp.cs.umb.edu:pub/tex/fontname.tar.Z


Previous: Prereqs, Up: Installation

2.2 Problems

This section lists some things which have caused trouble during installation. If you encounter other problems, please send a bug report. See Bugs, for how to submit a useful bug report.

Good luck.


Next: , Previous: Installation, Up: Top

3 Overview

This chapter gives an overview of what you do to create fonts using these programs. It also describes some things which are common to all the programs.

Throughout this document, we refer to various source files in the implementation. If you can read C programs, you may find these references useful as points of entry into the source when you are confused about some program's behavior, or are just curious.


Next: , Up: Overview

3.1 Picture

Following is a pictorial representation of the typical order in which these programs are used, as well as their input and output.

GSrenderfont is not in the picture since it is intended for an entirely separate purpose (namely, making bitmaps from PostScript outlines). Fontconvert also has many functions which are not needed for the basic task of font creation from scanned images.

                                              ---------------
                      / --------------------- | fontconvert |
                    /                         ---------------
                    |                  	 ^     ^
     scanned        v         |---------------|     |
     image         TFM        v                     v
     and IFI   -----------   GF   -------------  TFM, GF   --------  BZR
     ========> | imageto | ======> | charspace | =========> | limn | ======...
       ^       -----------         -------------     ^      --------
       |                                ^            |               (continued)
       v				  CMI           v
     -------------                               --------
     | imgrotate |                               | xbfe |
     -------------                               --------
     
     
     
                                       Metafont source    ------  GF
                                  |=====================> | mf | =========
      (continued)                 |                       ------
                                  |
           BZR   ---------  TFM
     ... ======> | bzrto |========|=======================
                 ---------        |
                     ^            |
                     |            |   PostScript Type 3 (pf3)
     	       CCC           |======================
                                  |
                                  |
                                  |    BPL    ------------  BZR
                                  |=========> | bpltobzr | =====
                                              ------------

See File formats, for more information on these file formats.


Next: , Previous: Picture, Up: Overview

3.2 Creating fonts

The previous section described pictorially the usual order in which these programs are used. This section will do the same in words.

Naturally, you may not need to go through all the steps described here. For example, if you are not starting with a scanned image, but already have a bitmap font, then the first step—running Imageto—is irrelevant.

Here is a description of the usual font creation process, starting with a scanned image of a type specimen and ending with fonts which can be used by Ghostscript, TeX, etc.

  1. To see what an image I consists of, run Imageto with either `-strips' or `-epsf'. This produces a bitmap font Isp in which each character is simply a constant number of scanlines from the image.
  2. Run Fontconvert (see Fontconvert) on Isp with the `-tfm' option, to produce a TFM file. This is because of the next step:
  3. Run TeX on imageto/strips.tex, telling TeX to use the font Isp. This produces a DVI file which you can print or preview as you usually do with TeX documents. (If you don't know how to do this, you'll have to ask someone knowledgeable at your site, or otherwise investigate.) This will (finally) show you what is in the image.

    An alternative to the above steps is to run Imageto with the `-epsf' option. This outputs an Encapsulated PostScript file with the image given as a simple PostScript bitmap. Then you can use Ghostscript or some other PostScript interpreter to look at the EPS file. This method is simpler, but has the disadvantage of using much more disk space, and needing a PostScript interpreter.

  4. If the original was not scanned in the normal orientation, the image must be rotated 90 degrees in some direction and/or flipped end for end. (Sometimes we have not scanned in the normal orientation because the physical construction of the book we were scanning made it difficult or impossible.) In this case, you must rotate the image to be upright. The program IMGrotate does this, given the `-flip' or `rotate-clockwise' option. Given an image RI, this outputs the upright image I.
  5. Once you have an upright image I, you can use Imageto (see Imageto) to extract the characters from the image and make a bitmap font I.dpigf, where dpi is the resolution of the image in pixels per inch. (If the image itself does not contain the resolution, you must specify it on the command line with `-dpi'.) To do this, you must first prepare an IFI file describing the image. See IFI files, for a description of IFI files.
  6. To view the resulting GF file, run Fontconvert to make a TFM file, as above. Then run TeX on testfont.tex and use the \table or \sample commands to produce a font table. Next, print or preview the DVI file that TeX outputs, as before. This will probably reveal problems in your IFI file, e.g., that not all the characters are present, or that they are not in the right positions. So you need to iterate until the image is correctly processed.

    testfont.tex should have come with your TeX distribution. If for some reason you do not have it, you can use the one distributed in the data directory.

  7. Once all the characters have been properly extracted from the image, you have a bitmap font. Unlike the above, the following steps all interact with each other, in the sense that fixing problems found at one stage may imply changes in an earlier stage. As a result, you must expect to iterate them several (billion) times.

    At any rate, given a bitmap font f you then run Charspace (see Charspace) to add side bearings to f, producing a new bitmap font, say g, and a corresponding TFM file g.tfm. To do this, you must prepare a CMI file specifying the side bearings. See CMI files, for a description of CMI files.

  8. To fit outlines to the characters in a bitmap font, run Limn (see Limn). Given the bitmap font g, it produces the BZR (see BZR files) outline font g.bzr. The side bearings in g are carried along.

    Although Limn will (should) always be able to fit some sort of outline to the bitmaps, you can get the best results only by fiddling with the (unfortunately numerous) parameters. See Invoking Limn.

  9. To convert from the BZR file g.bzr that Limn outputs to a font format that a typesetting program can use, run BZRto (see BZRto). While developing a font, we typically convert it to a Metafont program (with the `-metafont' option).

    As you get closer to a finished font, you may want to prepare a CCC file (see CCC files) to tell BZRto how construct composite characters (pre-accented `A's, for example) to complete the font.

  10. Given the font in Metafont form, you can then either make the font at its true size for some device, or make an enlarged version to examine the characters closely. See Metafont and BZRto, for the full details.

    Briefly, to do the former, run Metafont with a mode of whatever device you wish (the mode localfont will get you the most common local device, if Metafont has been installed properly). Then you can use testfont.tex to get a font sample, as described above.

    To do the latter, run Metafont with no assignment to mode. This should get you proof mode. You can then use GFtoDVI to get a DVI file with one character per page, showing you the control points Limn chose for the outlines.

  11. Problems can arise at any stage. For example, the character spacing might look wrong; in that case, you should fix the CMI files and rerun Charspace (and all subsequent programs, naturally). Or the outlines might not match the bitmaps very well; then you can change the parameters to Limn, or use XBfe (see XBfe) to hand-edit the bitmaps so Limn will do a better job. (To eliminate some of tedium of fixing digitization problems in the scanned image, you might want to use the filtering options in Fontconvert before hand-editing; see Character manipulation options.)

    Inevitably, as one problem gets fixed you notice new ones ...


Up: Creating fonts

3.2.1 Font creation example

This section gives a real-life example of font creation for the Garamond roman typeface, which we worked on concomitantly with developing the programs. We started from a scanned type specimen of 30 point Monotype Garamond, scanned using a Xerox 9700 scanner loaned to us from Interleaf, Inc. (Thanks to Paul English and others at Interleaf for this loan.)

    To begin, we used Imageto as follows to look at the image file we had scanned (see Viewing an image). Each line is a separate command.

              imageto -strips ggmr.img
              fontconvert -tfm ggmrsp.1200
              echo ggmrsp | tex strips.tex
              xdvi -p 1200 -s 10 strips.dvi
         
  1. Next, we created the file ggmr.ifi (distributed in the data directory), listing the characters in the order they appeared in the image, guessing at baseline offsets and (if necessary) including bounding box counts. Then we ran Imageto again, this time to get information about the baselines and spurious blotches in the image. We use the `-encoding' option since some of the characters in the image are not in the default ASCII encoding.
              imageto -print-guidelines -print-clean-info -encoding=gnulatin ggmr.img
         
  2. Based on the information gleaned from that run, we decided on the final baselines, adjusted the bounding box counts for broken-up characters, and extracted the font (see Image to font conversion). (In truth, this took several iterations.) The design size of the original image was stated in the book to be 30pt. We noticed several blotches in the image we needed to ignore, and so we added .notdef lines to ggmr.ifi as appropriate.
              imageto -verbose -baselines=121,130,120 \
                -designsize=30 -encoding=gnulatin ggmr.img
         
  3. To smooth some of the rough edges caused by the scanner's rasterization errors, we filtered the bitmaps with Fontconvert (see Fontconvert).
              fontconvert -verbose -gf -tfm -filter-passes=3 -filter-size=3 \
                ggmr30.1200 -output=ggmr30a
         
  4. For a first attempt at intercharacter and interword spacing, we created ggmr.1200cmi (also distributed in the data directory) and ran Charspace (see Charspace), producing ggmr30b.1200gf and ggmr30b.tfm. To see the results, we ran ggmr30b through testfont.tex, modified the CMI file, reran Charspace, etc., until the output was somewhat reasonable. We didn't try to fine-tune the spacing here, since we knew the following steps would affect the character shapes, which in turn would affect the spacing.
              charspace -verbose -cmi=ggmr.1200cmi ggmr30a.1200 -output=ggmr30b
         
  5. Next we ran ggmr30b.1200gf, created by Charspace, through Limn to produce the outline font in BZR form, ggmr30b.bzr. We couldn't know what the best values of all the fitting parameters were the first time, so we just increased the ones which are relative to the resolution.
              limn -verbose -corner-surround=4 -filter-surround=6 \
                -filter-alternative-surround=3 -subdivide-surround=6 \
                -tangent-surround=6 ggmr30b.1200
         
  6. Then we converted ggmr30b.bzr to a Metafont program using BZRto (see BZRto), and then ran Metafont to create TFM and GF files we could typeset with (see Metafont and BZRto). In order to keep the Metafont-generated files distinct from the original TFM and GF files, we use the output stem ggmr30B. To see the results at the usual 10pt, we then ran the Metafont output through sample.tex (a one-line wrapper for testfont.tex: `\input testfont \sample \end').
              bzrto -verbose -metafont ggmr30b -output=ggmr30B
              mf '\mode:=localfont; input ggmr30B'
              echo ggmr30B | tex sample
              dvips sample
         
  7. This 10pt output looked too small to us. So we changed the design size to 26pt (finding the value took several iterations) with Fontconvert (see Fontconvert), then reran Charspace, Limn, BZRto, Metafont, etc., as above. We only show the Fontconvert step here; the others have only the filenames changed from the invocations above.
              fontconvert -verbose -gf -tfm -designsize=26 ggmr30b.1200 -output=ggmr26c
         
  8. After this, the real work begins. We ran the Metafont program ggmr26D.mf in proof mode, followed by GFtoDVI, so we could see how well Limn did at choosing the control points for the outlines. See Proofing with Metafont. (The nodisplays tells Metafont not to bother displaying each character in a window online.)
              mf '\mode:=proof; nodisplays; input ggmr26D'
              gftodvi ggmr26D.3656gf
         
  9. From this, we went and hand-edited the font ggmr26d.1200gf with XBfe (see XBfe), and/or tinkered with the options to Limn, trying to make the outlines reasonable. We still haven't finished ...


Next: , Previous: Creating fonts, Up: Overview

3.3 Command-line options

Since these programs do not have counterparts on historical Unix systems, they need not conform to an existing interface. We chose to have all the programs use the GNU function getopt_long_only to parse command lines.

As a result, you can give the options in any order, interspersed as you wish with non-option arguments; you can use `-' or `--' to start an option; you can use any unambiguous abbreviation for an option name; you can separate option names and values with either `=' or one or more spaces; and you can use filenames that would otherwise look like options by putting them after an option `--'.

By convention, all the programs accept only one non-option argument, which is taken to be the name of the main input file.

If a particular option with a value is given more than once, it is the last value which is used.

For example, the following command line specifies the options `foo', `bar', and `verbose'; gives the value `abc' to the `baz' option, and the value `xyz' to the `quux' option; and specifies the filename -myfile-.

     -foo --bar -verb -abc=baz -quux karl -quux xyz -- -myfile-


Next: , Up: Command-line options

3.3.1 The main input file

By convention, all the programs accept only one non-option argument, which they take to be the name of the main input file.

Usually this is the name of a bitmap font. By their nature, bitmap fonts are for a particular resolution. You can specify the resolution in two ways: with the `-dpi' option (see the next section), or by giving an extension to the font name on the command line.

For example, you could specify the font foo at a resolution of 300dpi to the program program in either of these two ways (`$ ' being the shell prompt):

     $ program foo.300
     $ program -dpi=300 foo

You can also say, e.g., `program foo.300gf', but the `gf' is ignored. These programs always look for a given font in PK format before looking for it in GF format, under the assumption that if both fonts exist, and have the same stem, they are the same.

See File lookups (Kpathsearch library), for more details of the filename lookup.


Next: , Previous: Main input file, Up: Command-line options

3.3.2 Common options

Certain options are available in all or most of the programs. Rather than writing identical descriptions in the chapters for each of the programs, they are described here.

This first table lists common options which do not convey anything about the input. They merely direct the program to print additional output.

`-help'
Prints a usage message listing all available options on standard error. The program exits after doing so.
`-log'
Write information about everything the program is doing to the file foo.log, where foo is the root part of the main input file.
`-verbose'
Prints brief status reports as the program runs, typically the character code of each character as it is processed. This usually goes to standard output; but if the program is outputting other information there, it goes to standard error.
`-version'
Prints the version number of the program on standard output. If a main input file is supplied, processing continues; otherwise, the program exits normally.

This second table lists common options which change the program's behavior in more substantive ways.

`-dpi dpi'
Look for the main input font at a resolution of dpi pixels per inch. The default is to infer the information from the main input filename (see Main input file).
`-output-file fname'
Write the main output of the program to fname. If fname has a suffix, it is used unchanged; otherwise, it is extended with some standard suffix, such as resolutiongf. Unless fname is an absolute or explicitly relative pathname, the file is written in the current directory.
`-range start-end'
Only look at the characters between the character codes start and end, inclusive. The default is to look at all characters in the font. See Specifying character codes, for the precise syntax of character codes.


Next: , Previous: Common options, Up: Command-line options

3.3.3 Specifying character codes

Most of the programs allow you to specify character codes for various purposes. Character codes are always parsed in the same way (using the routines in lib/charcode.c and lib/charspec.c).

You can specify the character code directly, as a numeric value, or indirectly, as a character name to be looked up in an encoding vector.


Next: , Up: Specifying character codes
3.3.3.1 Named character codes

If a string being parsed as a character code is more than one character long, or starts with a non-digit, it is always looked up as a name in an encoding vector before being considered as a numeric code. We do this because you can always specify a particular value in one of the numeric formats, if that's what you want.

The encoding vector used varies with the program; you can always define an explicit encoding vector with the `-encoding' option. If you don't specify one explicitly, programs which must have an encoding vector use a default; programs which can proceed without one do not. See Encoding files, for more details on encoding vectors.

As a practical matter, the only character names which have length one are the 52 letters, `A'–`Z', `a'–`z'. In virtually all common cases, the encoding vector and the underlying character set both have these in their ASCII positions. (The exception is machines that use the EBCDIC encoding.)


Previous: Named character codes, Up: Specifying character codes
3.3.3.2 Numeric character codes

The following variations for numeric character codes are allowed. The examples all assume the character set is ASCII.

Character codes must be between zero and 255 (decimal), inclusive.


Previous: Specifying character codes, Up: Command-line options

3.3.4 Common option values

The programs have a few common conventions for how to specify option values that are more complicated than simple numbers or strings.

Some options take not a single value, but a list. In this case, the individual values are separated by commas or whitespace, as in `-omit=1,2,3' or `-omit="1 2 3"'. Although using whitespace to separate the values is less convenient when typing them interactively, it is useful when you have a list that is so long you want to put it in the file. Then you can use cat in conjunction with shell quoting to get the value: `-omit="`cat file`"'.

Other options take a list of values, but each value is a keyword and a corresponding quantity, as in `-fontdimens name:real,name,real'.

Finally, a few options take percentages, which you specify as an integer between 0 and 100, inclusive.


Next: , Previous: Command-line options, Up: Overview

3.4 Font searching

These programs use the same environment variables and algorithms for finding font files as does (the Unix port of) TeX and its friends.

You specify the default paths in the top-level Makefile. The environment variables TEXFONTS, PKFONTS, TEXPKS, and GFFONTS override those paths. Both the default paths and the environment variable values should consist of a colon-separated list of directories.

Specifically, a TFM file is looked for along the path specified by TEXFONTS; a GF file along GFFONTS, then TEXFONTS; a PK file along PKFONTS, then TEXPKS, then TEXFONTS.

See Path specifications (Kpathsea library), for details of interpretation of environment variable values.


Previous: Font searching, Up: Overview

3.5 Font naming

Naming font files has always been a difficult proposition at best. On the one hand, the names should be as portable as possible, so the fonts themselves can be used on almost any platform. On the other hand, the names should be as descriptive and comprehensive as possible. The best compromise we have been able to work out is described in a separate document: Introduction (Filenames for TeX fonts). See Archives, for where to obtain.

Filenames for GNU project fonts should start with `g', for the “source” abbreviation of “GNU”.

Aside from a general font naming scheme, when developing fonts you must keep the different versions straight. We do this by appending a “version letter” `a', `b', ... to the main bitmap filename. For example, the original Garamond roman font we scanned was a 30 point size, so the main filename was ggmr30 (`g' for GNU, `gm' for Garamond, `r' for roman). As we ran the font through the various programs, we named the output ggmr30b, ggmr30c, and so on.

Since the outline fonts produced by BZRto are scalable, we do not include the design size in their names. (BZRto removes a trailing number from the input name by default.)


Next: , Previous: Overview, Up: Top

4 Bugs

(This chapter is adapted from the analogous one in the GCC manual, written by Richard Stallman.)

Your bug reports are essential in making these programs reliable.

Reporting a bug may help you by bringing a solution to your problem, or it may not. (If it does not, look in the service directory, which is part of the GNU CC and GNU Emacs distributions.) In any case, the principal function of a bug report is to help the entire community by making the next release work better.

Send bug reports for the GNU font utilities, or for their documentation, to the address bug-gnu-utils@prep.ai.mit.edu. We also welcome suggestions for improvements, no matter how small.

In order for a bug report to serve its purpose, you must include the information that makes for fixing the bug, as described below.

Thanks (in advance)!


Next: , Up: Bugs

4.1 Bug criteria

If you are not sure whether you have found a bug, here are some guidelines:


Previous: Bug criteria, Up: Bugs

4.2 Bug reporting

The purpose of a bug report is to enable someone to fix the bug if it is not known. It isn't important what happens if the bug is already known. Therefore, always write your bug reports on the assumption that the bug is not known.

Sometimes people give a few sketchy facts and ask, “Does this ring a bell?” or “Should this be happening?” This cannot help us fix a bug, so it is basically useless. We can only respond by asking for the details below, so we can investigate. You might as well expedite matters by sending them to begin with.

Try to make your bug report self-contained. If we ask you for more information, it is best if you include all the original information in your response, as well as the new information. We might have discarded the previous message, or even if we haven't, it takes us time to search for it. Similarly, if you've reported bugs before, it is still best to send all the information; we can't possibly remember what environment everyone uses!


Next: , Up: Bug reporting

4.2.1 Necessary information

To enable us to fix a bug, please include all the information below. If the bug was in compilation or installation, as opposed to in actually running one of the programs, the last two items are irrelevant. But in that case, please also make sure it is not a known problem before reporting it. See Problems.

You should include all of the following in your bug report:

In other words, we need enough information so that we can run the offending program under the debugger, so we can find out what's happening. Without all the command-line arguments, or the input file in question, we cannot do this. Since you must have found the bug by running the program with a particular set of options and on a particular input file, you already have this information; all you need to do is send it!


Next: , Previous: Necessary information, Up: Bug reporting

4.2.2 Unnecessary information

Here are some things that are not necessary to include in a bug report.


Previous: Unnecessary information, Up: Bug reporting

4.2.3 Documentation bugs

It is just as important to report bugs in the documentation as in the programs. If you want to do something using these programs, and reading the manual doesn't tell you how, that is probably a bug. In fact, the best way to report it is something like: “I want to do x; I looked in the manual in sections a and b, but they didn't explain it.”

If your bug report makes it clear that you've actually made an attempt to find the answers using the manual, we will be much more likely to take action (since we won't have to search the manual ourselves).


Next: , Previous: Bugs, Up: Top

5 File formats

These programs use various data files to specify font encodings, auxliary information for a font, and other things. Some of these data files are distributed in the directory data; others must be constructed on a font-by-font basis.

If the environment variable FONTUTIL_LIB is set, data files are looked up along the path it specifies, using the same algorithm as is used for font searching (see Font searching). Otherwise, the default path is set in the top-level Makefile.

The following sections (in other chapters of the manual) also describe file formats:


Next: , Up: File formats

5.1 File format abbreviations

For the sake of brevity, we do not spell out every abbreviation (typically of file format names) in the manual every time we use it. This section collects and defines all the common abbreviations we use.

BPL
The `Bezier property list' format output by BZRto and read by BPLtoBZR. This is a transliteration of the binary BZR format into human-readable (and -editable) text. See BPL files.


BZR
The `Bezier' outline format output by Limn and read by BZRto. We invented this format ourselves. See BZR files.


CCC
The `cookie-cutter character' (er, `composite character construction') files read by BZRto to add pre-accented and other such characters to a font. See CCC files.


CMI
The `character metric information' files read by Charspace to add side bearings to a font. See CMI files.


GF
The `generic font' bitmap format output by Metafont (and by most of these programs). See the sources for Metafont or one of the other TeX font utility programs (GFtoPK, etc.) for the definition.


DVI
The `device independent' format output by TeX, GFtoDVI, etc. Many “DVI driver” programs have been written to translate DVI format to something that can actually be printed or previewed. See sources for TeX or DVItype for the definition.


EPS
The `Encapsulated PostScript' format output by many programs, including Imageto (see Viewing an image) and Fontconvert (see Fontconvert output options). An EPS file differs from a plain PostScript file in that it contains information about the PostScript image it produces: its bounding box, for example. (This information is contained in comments, since PostScript has no good way to express such information directly.)


IFI
The `image font information' files read by Imageto when making a font from an image. See IFI files.


GSF
The `Ghostscript font' format output by BZRto and the bdftops program in the Ghostscript distribution. This is nothing more than the Adobe Type 1 font format, unencrypted. The Adobe Type 1 format is defined in a book published by Adobe. (Many PostScript interpreters cannot read unencrypted Type 1 fonts, despite the fact that the definition says encryption is not required. Ghostscript can read both encrypted and unencrypted Type 1 fonts.)


IMG
The `image' format used by some GEM (a window system sometimes used under DOS) programs; specifically, by the program which drives our scanner.


MF
The `Meta-Font' programming language for designing typefaces invented by Donald Knuth. His Metafontbook is the only manual written to date (that we know of).


PBM
The `portable bitmap' format used by the PBMplus programs, Ghostscript, Imageto, etc. It was invented by Jef Poskanzer (we believe), the author of PBMplus.


PFA
The `printer font ASCII' format in which Type 1 PostScript fonts are sometimes distributed. This format uses the ASCII hexadecimal characters `0' to `9' and `a' to `f' (and/or `A' to `F') to represent an eexec-encrypted Type 1 font.


PFB
The `printer font binary' format in which Type 1 PostScript fonts are sometimes distributed. This format is most commonly used on DOS systems. (Personally, we find the existence of this format truly despicable, as one of the major advantages of PostScript is its being defined entirely in terms of plain text files (in Level 1 PostScript, anyway). Having an unportable binary font format completely defeats this.)


PK
The `packed font' bitmap format output by GFtoPK. PK format has (for all practical purposes) the same information as GF format, and does a better job of packing: typically a font in PK format will be one-half to two-thirds of the size of the same font in GF format. It was invented by Tom Rokicki as part of the TeX project. See the GFtoPK source for the definition.


PL
The `property list' format output by TFtoPL. This is a transliteration of the binary TFM format into human-readable (and -editable) text. Some of these programs output a PL file and call PLtoTF to make a TFM from it. (For technical reasons it is easier to do this than to output a TFM file directly.) See the PLtoTF source for the details.


TFM
The `TeX font metric' format output by Metafont, PLtoTF, and other programs, and read by TeX. TFM files include only character dimension information (widths, heights, depths, and italic corrections), kerns, ligatures, and font parameters; in particular, there is no information about the character shapes. See the TeX or Metafont source for the definition.


Next: , Previous: File format abbreviations, Up: File formats

5.2 Common file syntax

Data files read by these programs are text files that share certain syntax elements:

A line can be as long as you want.


Next: , Previous: Common file syntax, Up: File formats

5.3 Encoding files

The encoding of a font specifies the mapping from character codes (an integer, typically between zero and 255) to the characters themselves; e.g., does a character with code 92 wind up printing as a backslash (as it does under the ASCII encoding) or as a double left quote (as it does under the most common TeX font encoding)? Put another way, the encoding is the arrangement of the characters in the font.

It is sad but true that no single encoding has been widely adopted, even for basic text fonts. (Text fonts and, say, math fonts or symbol fonts will clearly have different encodings.) Every typesetting program and/or font source seems to come up with a new encoding; GNU is no exception (see below). Therefore, when you decide on the encoding for the fonts you create, you should choose whatever is most convenient for the typesetting programs you intend to run it with. (Decent typesetting systems would make it trivial to set font encodings; unfortunately, almost nothing is decent in that regard!)

The encoding file format we invented is a font-format-independent representation of an encoding. Encoding files are “data files” which have the basic syntax elements described above (see Common file syntax). They are usually named with the extension .enc.

The first nonblank non-comment line in an encoding file is a string to put into TFM files as the “coding scheme” to describe the encoding; some common coding schemes are `TeX text', `TeX math symbol', `Adobe standard'. Case is irrelevant; that is, any programs which use the coding scheme should pay no attention to its case.

Thereafter, each nonblank non-comment line defines the character for the corresponding code: the first such line defines the character with code zero, the next with code one, and so on.

Each character consists of a name, optionally followed by ligature information. (All fonts using the same encoding should have the same ligatures, it seems to us.)


Next: , Up: Encoding files

5.3.1 Character names

The character name in an encoding file is an arbitrary sequence of nonblank characters (except it can't include a %, since that starts a comment). Conventionally, it consists of only lowercase letters, except where an uppercase letter is actually involved. (For example, eacute is a lowercase e with an acute accent; Eacute is an uppercase E with an acute accent.

If a character code has no equivalent character in the font, i.e., the font table has a “blank spot”, you should use the name .notdef for that code. This is the only name you can usefully give more than once. If any other name is used more than once, the results are undefined.

To avoid unnecessary proliferation of character names, you should use names from existing .enc files where possible. All the .enc files we have created are distributed in the data directory.


Next: , Previous: Character names, Up: Encoding files

5.3.2 Ligature definitions

The ligature information for a character in an encoding file is optional. More than one ligature specification may be given. Each specification looks like:

     lig second-char =: lig-char

This means that a ligature character lig-char should be present in the font for the current character (the one being defined on this line of the encoding file) followed by second-char. You give second-char and lig-char as character codes (see Specifying character codes). For example, in most text encodings (which involve Latin characters), some variation on the following line will be present:

     f       lig f =: 013  lig i =: 014  lig l =: 015

This will produce a ligature in the font such that when a typesetting program sees the two character sequence `ff' in the input, it replaces those two characters in the output with the single character at position octal 13 (presumably the `fi' ligature) of the font; when it sees `fi', the character at position octal 14 is output; when it sees `fl', the character at position octal 15 is output.

Metafont version 2 allows a more general ligature scheme; if there is a demand for it, it wouldn't be hard to add.


Previous: Ligature definitions, Up: Encoding files

5.3.3 GNU encodings

When we started making fonts for the GNU project, we had to decide on some font encoding. We hoped to use an existing one, but none that we found seemed suitable: the TeX font encodings, including the “Cork encoding” described in TUGboat 11#4, lacked many standard PostScript characters; conversely, the standard PostScript encodings lacked useful TeX characters. Since we knew that Ghostscript and TeX would be the two main applications using the fonts, we thought it unacceptable to favor one at the expense of the other.

Therefore, we invented two new encodings. The first one, “GNU Latin text” (distributed in data/gnulatin.enc), is based on ISO Latin 1, and is close to a superset of both the basic TeX text encoding and the Adobe standard text encoding. We felt it was best to use ISO Latin 1 as the foundation, since some existing systems actually use ISO Latin 1 instead of ASCII. We also left the first eight positions open, so particular fonts could add more ligatures or other unusual characters.

The second, “GNU Latin text complement” (distributed in data/gnulcomp.enc), includes the remaining pre-accented characters from the Cork encoding, the PostScript expert encoding, swash characters, small caps, etc.


Previous: Encoding files, Up: File formats

5.4 Coding scheme map file

When a program reads a TFM file, it's given an arbitrary string (at best) for the coding scheme. To be useful, it needs to find the corresponding encoding file. We couldn't think of any way to name our .enc files that would allow the filename to be guessed automatically. Therefore, we invented another data file which maps the TFM coding scheme strings to our .enc filenames.

This file is distributed as data/encoding.map. See Common file syntax, for a description of the common syntax elements.

Each nonblank non-comment line in encoding.map has two entries: the first word (contiguous nonblank characters) is the .enc filename; the rest of the line, after ignoring whitespace, is the string in the TFM file. This should be the same string that appears on the first line of the .enc file (see Encoding files).

Programs should ignore case when using the coding scheme string.

Here is the coding scheme map file we distribute:

     adobestd 	Adobe standard
     ascii		ASCII
     dvips		dvips
     dvips		TeX text + adobestandardencoding
     gnulatin	GNU Latin text
     gnulcomp 	GNU Latin text complement
     psymbol 	PostScript Symbol
     texlatin	Extended TeX Latin
     textext		TeX text
     zdingbat	Zapf Dingbats


Next: , Previous: File formats, Up: Top

6 Imageto

Imageto converts an image file (currently either in portable bitmap format (PBM) or GEM's IMG format) to either a bitmap font or an Encapsulated PostScript file (EPSF). An image file is simply a large bitmap.

If the output is a font, it can be constructed either by outputting a constant number of scanlines from the image as each “character” or (more usually) by extracting the “real” characters from the image.

The current selection of input formats is rather arbitrary. We implemented the IMG format because that is what our scanner outputs, and the PBM format because Ghostscript can output it (see GSrenderfont). Other formats could easily be added.


Next: , Up: Imageto

6.1 Imageto usage

Usually there are two prerequisites to extracting a usable font from an image file. First, looking at the image, so you can see what you've got. Second, preparing the IFI file describing the contents of the image: the character codes to output, any baseline adjustment (as for, e.g., `j'), and how many pieces each character has. Each is a separate invocation of Imageto; the first time with either the `-strips' or `-epsf' option, the second time with neither.

In the second step, Imageto considers the input image as a series of image rows. Each image row consists of all the scanlines between a nonblank scanline and the next entirely blank scanline. (A scanline is a single horizontal row of pixels in the image.) Within each image row, Imageto looks top-to-bottom, left-to-right, for bounding boxes: closed contours, i.e., an area whose edge you can trace with a pencil without lifting it.

For example, in the following image Imageto would find two image rows, the first from scanlines 1 to scanline 7, the second consisting of only scanline 10. There are six bounding boxes in the first image row, only one in the second. (This example also shows some typical problems in scanned images: the baseline of the `m' is not aligned with those of the `i', `j', and `l'; a meaningless black line is present; the `i' and `j' overlap.)

       01234567890123456789
      0
      1       x
      2 x x   x
      3       x
      4 x x   x   xxxxx
      5 x x   x   x x x
      6   x       x x x
      7 xx
      8
      9
     10 xxxxxxxxxxxxxxx


Next: , Up: Imageto usage

6.1.1 Viewing an image

Typically, the first step in extracting a font from an image is to see exactly what is in the image. (Clearly, this is unnecessary if you already know what your image file contains.)

The simplest way to get a look at the image file, if you have Ghostscript or some other suitable PostScript interpreter, is to convert the image file into an EPSF file with the `-epsf' option. Here is a possible invocation:

     imageto -epsf ggmr.img

Here we read an input file ggmr.img; the output is ggmr.eps. You can then view the EPS file with

     gs ggmr.eps

(presuming that gs invokes your PostScript interpreter).

If you don't have both a suitable PostScript interpreter and enough disk space to store the EPS file (it uses approximately twice as much disk space as the original image), the above won't work. Instead, to view the image you must make a font with the `-strips' option:

     imageto -strips ggmr.img

The output of this will be ggmrsp.1200gf (our image having a resolution of 1200 dpi). Although the GF font cannot be conveniently viewed directly, you can use TeX and your favorite DVI processor to look at it, as follows:

     fontconvert -tfm ggmrsp.1200
     echo ggmrsp | tex strips

This outputs in strips.dvi, which you can view with your favorite DVI driver. (See Archives, for how to obtain the DVI drivers for PostScript and X we recommend.)

strips.tex is distributed in the imageto directory.


Next: , Previous: Viewing an image, Up: Imageto usage

6.1.2 Image to font conversion

Once you can see what is in the image, the next step is to prepare the IFI file (see IFI files) corresponding to its characters. Imageto relies completely on the IFI files to describe the image; it makes no attempt at optical character recognition, i.e., guessing what the characters are from their shapes.

You must also decide on a few more aspects of the output font, which you specify with options:

The final invocation to produce the font might look something like this:

     imageto -baselines=121,130,120 -designsize=26 ggmr

The output from this would be ggmr26.1200gf.


Previous: Image to font conversion, Up: Imageto usage

6.1.3 Dirty images

Your image may not be completely “clean”, i.e., the scanning process may have introduced artifacts: black lines at the edge of the paper; blotches where the original had a speck of dirt or ink; broken lines where the image had a continuous line. To get a correct output font, you must correct these problems.

To remove blotches, you can simply put .notdef in the appropriate place in the IFI file. You can find the “appropriate place” when you look at the output font; some character will be nothing but a (possibly tiny) speck, and all the characters following will be in the wrong position.

The `-print-clean-info' option might also help you to diagnose which bounding boxes are being assigned to which characters, when you are in doubt. Here is an example of its output:

     [Cleaning 149x383 bitmap:
       checking (0
       checking (0
       checking (0
       checking (113
     106]

The final `106' is the character code output (ASCII `j'). The size of the overall bitmap which contains the `j' is 149 pixels wide and 383 pixels high. The bitmap contained four bounding boxes, the last two of which belonged to the `j' and were kept, and the first two from the adjacent character (`i') and were erased. (As shown in the example image above, the tail of the `j' often overlaps the `i' in type specimens.)

If the image has blobs you have not removed with .notdef, you will see a small bounding box in this output. The numbers shown are