This manual documents how to install and run the GNU font utilities. It corresponds to version REPLACE-WITH-VERSION (released in REPLACE-WITH-MONTH-YEAR).
The introduction briefly describes the purpose and philosophy of the font utilities. The overview gives details on their general usage, especially how they interact, and describes various things which are common to all or most of the programs.
The first part of this master menu lists the major nodes in this Info document, including the index. The rest of the menu lists all the lower level nodes in the document.
--- The Detailed Node Listing ---
Installation
Prerequisites
Overview
Creating fonts
Command-line options
Specifying character codes
Bugs
Bug reporting
File formats
Encoding files
Imageto
Imageto usage
IMGrotate
IMGrotate usage
Fontconvert
Invoking Fontconvert
Charspace
CMI files
fontdimen command
Limn
Limn algorithm
Fitting the bitmap curve
BZRto
Metafont and BZRto
CCC files
BZR files
BZR characters
BPLtoBZR
BPL files
BPL characters
XBfe
XBfe usage
XBfe shape editing
BZRedit
BZRedit usage
Editing BPL files
GSrenderfont
GSrenderfont usage
Enhancements
This manual corresponds to version REPLACE-WITH-VERSION of the GNU font utilities.
You can manipulate fonts in various ways using the utilities: conversion of a scanned image to a bitmap font, hand-editing of bitmaps, conversion of a bitmap font to an outline font, and more. More generally, you can start with a scanned image of artwork and work your way through to a finished font with side bearings, accented characters, ligatures, and so on.
The font formats recognized by these programs are primarily those used by the (freely available) TeX typesetting system developed by Donald E. Knuth from 1977–1990. The filenames, font searching, and other aspects of their usage are also based on TeX. They also support output of PostScript Type 1 fonts.
Some of this software was originally written as part of the research program in digital typography at the University of Massachusetts at Boston, directed by Robert A. Morris. The staff at UMB, Rick Martin in particular, has been kind enough to let us to continue to use their computers, despite our completing the Master's program there in 1989.
See Prereqs, for what you need to have installed before you can compile these programs.
After that, here's what to do:
sh configure in the top-level directory. This tries to figure
out system dependencies and the installation prefix.
See The configure script (Kpathsearch library), for options and other information about configure.
make install.
If you encounter problems anywhere along the line, let us know. Known problems are listed below (see Problems). See Bugs, for details on how to submit a useful bug report.
To compile and use these programs, the following are necessary:
PLtoTF and
GFtoPK.
gawk. This is only needed if you want to
use GSrenderfont.
See the section below for information on how to get all these programs.
The canonical source for all GNU software, including the GNU C compiler, GNU make, and Ghostscript, is prep.ai.mit.edu:pub/gnu. That directory is replicated at many other sites around the world, including:
wuarchive.wustl.edu, ftp.cs.widener.edu,
uxc.cso.uiuc.edu
gatekeeper.dec.com:/pub/GNU
src.doc.ic.ac.uk:/gnu, ftp.informatik.tu-muenchen.de,
ftp.informatik.rwth-aachen.de:/pub/gnu
ugle.unit.no
ftp.denet.dk
archive.eu.net
archie.oz.au:/gnu (`archie.oz' or `archie.oz.au' for ACSnet)
ftp.cs.titech.ac.jp, utsun.s.u-tokyo.ac.jp:/ftpsync/prep,
cair.kaist.ac.kr:/pub/gnu
You can also order tapes with GNU software from the Free Software Foundation (thereby supporting the development of the font utilities and the rest of the GNU project); send mail to `gnu@prep.ai.mit.edu' for the latest prices and ordering information, or retrieve the file DISTRIB from a GNU archive.
The canonical source for the X window system is export.lcs.mit.edu:pub/R5. That directory is also shadowed at many other sites, including `gatekeeper.dec.com'. The FSF also sells X distribution tapes.
TeX is more scattered. A complete Unix TeX distribution is available by ordering a tape from the University of Washington (send email to `elisabet@u.washington.edu'. Three archives with complete (and identical) TeX collections:
ftp.uni-stuttgart.de:/soft/tex
ftp.tex.ac.uk:/pub/archive
pip.shsu.edu:/tex-archive
The canonical sources for just Web2C—the port of just TeX, Metafont, and friends to Unix, without DVI processors, fonts, macro packages, etc.—are:
ftp.cs.umb.edu:pub/tex/ (Boston) ics.uci.edu:TeX/ (California) ftp.th-darmstadt.de:pub/tex/src/web2c/ (Germany)
At all these sites, the files to retrieve are web.tar.Z and web2c.tar.Z.
The DVI-to-PostScript driver we recommend is Tom Rokicki's Dvips, and the X window system driver we recommend is Paul Vojta's XDvi. These programs are available from, respectively,
labrea.stanford.edu:pub/dvips*
export.lcs.mit.edu:contrib/xdvi.tar.Z
We have modified XDvi and Dvips to use the same path searching code as the current distribution of TeX and these font utilities; the modified versions are available from ftp.cs.umb.edu:pub/tex.
To use Metafont, you must have a file defining output devices. (See Metafont and BZRto.) We recommend you obtain modes.mf from
ftp.cs.umb.edu:pub/tex/modes.mf
You can retrieve the document describing all the details of the naming scheme for TeX fonts from
ftp.cs.umb.edu:pub/tex/fontname.tar.Z
This section lists some things which have caused trouble during installation. If you encounter other problems, please send a bug report. See Bugs, for how to submit a useful bug report.
__Xsi...), and furthermore that the multibyte functions need
to specifically call the dynamic linking functions.)
The file lib/dlsym.c (from the MIT X distribution) defines the
dlsym, dlclose, and dlopen symbols, so static
linking should work now.
If the current setup fails, it might work to change `-lXaw' in
the definition of X_libraries in lib/defs.make to
the full pathname of the Xaw library.
fmod: the routine takes two doubles, not one.
We simply corrected our system include file.
You may get compiler warnings for the file widgets/Bitmap.c at
the lines which use the Xt function XtIsRealized on systems which
define NULL as (void *) 0. The reason is that macro
definition of XtIsRealized in <X11/IntrinsicP.h>
incorrectly compares the result of XtWindowOfObject to
NULL, instead of 0. If the warnings bother you, fix
IntrinsicP.h.
XAPPLRESDIR environment variable to that
directory. See the tutorial on resources that comes with the MIT X
distribution (mit/doc/tutorial/resources.txt) for more
information.
Good luck.
This chapter gives an overview of what you do to create fonts using these programs. It also describes some things which are common to all the programs.
Throughout this document, we refer to various source files in the implementation. If you can read C programs, you may find these references useful as points of entry into the source when you are confused about some program's behavior, or are just curious.
Following is a pictorial representation of the typical order in which these programs are used, as well as their input and output.
GSrenderfont is not in the picture since it is intended for an entirely separate purpose (namely, making bitmaps from PostScript outlines). Fontconvert also has many functions which are not needed for the basic task of font creation from scanned images.
---------------
/ --------------------- | fontconvert |
/ ---------------
| ^ ^
scanned v |---------------| |
image TFM v v
and IFI ----------- GF ------------- TFM, GF -------- BZR
========> | imageto | ======> | charspace | =========> | limn | ======...
^ ----------- ------------- ^ --------
| ^ | (continued)
v CMI v
------------- --------
| imgrotate | | xbfe |
------------- --------
Metafont source ------ GF
|=====================> | mf | =========
(continued) | ------
|
BZR --------- TFM
... ======> | bzrto |========|=======================
--------- |
^ |
| | PostScript Type 3 (pf3)
CCC |======================
|
|
| BPL ------------ BZR
|=========> | bpltobzr | =====
------------
See File formats, for more information on these file formats.
The previous section described pictorially the usual order in which these programs are used. This section will do the same in words.
Naturally, you may not need to go through all the steps described here. For example, if you are not starting with a scanned image, but already have a bitmap font, then the first step—running Imageto—is irrelevant.
Here is a description of the usual font creation process, starting with a scanned image of a type specimen and ending with fonts which can be used by Ghostscript, TeX, etc.
An alternative to the above steps is to run Imageto with the `-epsf' option. This outputs an Encapsulated PostScript file with the image given as a simple PostScript bitmap. Then you can use Ghostscript or some other PostScript interpreter to look at the EPS file. This method is simpler, but has the disadvantage of using much more disk space, and needing a PostScript interpreter.
\table
or \sample commands to produce a font table. Next, print or
preview the DVI file that TeX outputs, as before. This will probably
reveal problems in your IFI file, e.g., that not all the characters are
present, or that they are not in the right positions. So you need to
iterate until the image is correctly processed.
testfont.tex should have come with your TeX distribution. If for some reason you do not have it, you can use the one distributed in the data directory.
At any rate, given a bitmap font f you then run Charspace (see Charspace) to add side bearings to f, producing a new bitmap font, say g, and a corresponding TFM file g.tfm. To do this, you must prepare a CMI file specifying the side bearings. See CMI files, for a description of CMI files.
Although Limn will (should) always be able to fit some sort of outline to the bitmaps, you can get the best results only by fiddling with the (unfortunately numerous) parameters. See Invoking Limn.
As you get closer to a finished font, you may want to prepare a CCC file (see CCC files) to tell BZRto how construct composite characters (pre-accented `A's, for example) to complete the font.
Briefly, to do the former, run Metafont with a mode of whatever
device you wish (the mode localfont will get you the most
common local device, if Metafont has been installed properly). Then you
can use testfont.tex to get a font sample, as described above.
To do the latter, run Metafont with no assignment to mode. This
should get you proof mode. You can then use GFtoDVI to get a DVI
file with one character per page, showing you the control points Limn
chose for the outlines.
Inevitably, as one problem gets fixed you notice new ones ...
This section gives a real-life example of font creation for the Garamond roman typeface, which we worked on concomitantly with developing the programs. We started from a scanned type specimen of 30 point Monotype Garamond, scanned using a Xerox 9700 scanner loaned to us from Interleaf, Inc. (Thanks to Paul English and others at Interleaf for this loan.)
To begin, we used Imageto as follows to look at the image file we had scanned (see Viewing an image). Each line is a separate command.
imageto -strips ggmr.img
fontconvert -tfm ggmrsp.1200
echo ggmrsp | tex strips.tex
xdvi -p 1200 -s 10 strips.dvi
ASCII encoding.
imageto -print-guidelines -print-clean-info -encoding=gnulatin ggmr.img
.notdef lines to ggmr.ifi as appropriate.
imageto -verbose -baselines=121,130,120 \
-designsize=30 -encoding=gnulatin ggmr.img
fontconvert -verbose -gf -tfm -filter-passes=3 -filter-size=3 \
ggmr30.1200 -output=ggmr30a
charspace -verbose -cmi=ggmr.1200cmi ggmr30a.1200 -output=ggmr30b
limn -verbose -corner-surround=4 -filter-surround=6 \
-filter-alternative-surround=3 -subdivide-surround=6 \
-tangent-surround=6 ggmr30b.1200
bzrto -verbose -metafont ggmr30b -output=ggmr30B
mf '\mode:=localfont; input ggmr30B'
echo ggmr30B | tex sample
dvips sample
fontconvert -verbose -gf -tfm -designsize=26 ggmr30b.1200 -output=ggmr26c
proof mode, followed by GFtoDVI, so we could
see how well Limn did at choosing the control points for the outlines.
See Proofing with Metafont. (The nodisplays tells Metafont
not to bother displaying each character in a window online.)
mf '\mode:=proof; nodisplays; input ggmr26D'
gftodvi ggmr26D.3656gf
Since these programs do not have counterparts on historical Unix
systems, they need not conform to an existing interface. We chose to
have all the programs use the GNU function getopt_long_only to
parse command lines.
As a result, you can give the options in any order, interspersed as you wish with non-option arguments; you can use `-' or `--' to start an option; you can use any unambiguous abbreviation for an option name; you can separate option names and values with either `=' or one or more spaces; and you can use filenames that would otherwise look like options by putting them after an option `--'.
By convention, all the programs accept only one non-option argument, which is taken to be the name of the main input file.
If a particular option with a value is given more than once, it is the last value which is used.
For example, the following command line specifies the options `foo', `bar', and `verbose'; gives the value `abc' to the `baz' option, and the value `xyz' to the `quux' option; and specifies the filename -myfile-.
-foo --bar -verb -abc=baz -quux karl -quux xyz -- -myfile-
By convention, all the programs accept only one non-option argument, which they take to be the name of the main input file.
Usually this is the name of a bitmap font. By their nature, bitmap fonts are for a particular resolution. You can specify the resolution in two ways: with the `-dpi' option (see the next section), or by giving an extension to the font name on the command line.
For example, you could specify the font foo at a resolution of
300dpi to the program program in either of these two ways
(`$ ' being the shell prompt):
$ program foo.300
$ program -dpi=300 foo
You can also say, e.g., `program foo.300gf', but the `gf' is ignored. These programs always look for a given font in PK format before looking for it in GF format, under the assumption that if both fonts exist, and have the same stem, they are the same.
See File lookups (Kpathsearch library), for more details of the filename lookup.
Certain options are available in all or most of the programs. Rather than writing identical descriptions in the chapters for each of the programs, they are described here.
This first table lists common options which do not convey anything about the input. They merely direct the program to print additional output.
This second table lists common options which change the program's behavior in more substantive ways.
-end'Most of the programs allow you to specify character codes for various purposes. Character codes are always parsed in the same way (using the routines in lib/charcode.c and lib/charspec.c).
You can specify the character code directly, as a numeric value, or indirectly, as a character name to be looked up in an encoding vector.
If a string being parsed as a character code is more than one character long, or starts with a non-digit, it is always looked up as a name in an encoding vector before being considered as a numeric code. We do this because you can always specify a particular value in one of the numeric formats, if that's what you want.
The encoding vector used varies with the program; you can always define an explicit encoding vector with the `-encoding' option. If you don't specify one explicitly, programs which must have an encoding vector use a default; programs which can proceed without one do not. See Encoding files, for more details on encoding vectors.
As a practical matter, the only character names which have length one are the 52 letters, `A'–`Z', `a'–`z'. In virtually all common cases, the encoding vector and the underlying character set both have these in their ASCII positions. (The exception is machines that use the EBCDIC encoding.)
The following variations for numeric character codes are allowed. The examples all assume the character set is ASCII.
Character codes must be between zero and 255 (decimal), inclusive.
The programs have a few common conventions for how to specify option values that are more complicated than simple numbers or strings.
Some options take not a single value, but a list. In this case, the individual values are separated by commas or whitespace, as in `-omit=1,2,3' or `-omit="1 2 3"'. Although using whitespace to separate the values is less convenient when typing them interactively, it is useful when you have a list that is so long you want to put it in the file. Then you can use cat in conjunction with shell quoting to get the value: `-omit="`cat file`"'.
Other options take a list of values, but each value is a keyword and a corresponding quantity, as in `-fontdimens name:real,name,real'.
Finally, a few options take percentages, which you specify as an integer between 0 and 100, inclusive.
These programs use the same environment variables and algorithms for finding font files as does (the Unix port of) TeX and its friends.
You specify the default paths in the top-level Makefile. The
environment variables TEXFONTS, PKFONTS, TEXPKS,
and GFFONTS override those paths. Both the default paths and the
environment variable values should consist of a colon-separated list of
directories.
Specifically, a TFM file is looked for along the path specified by
TEXFONTS; a GF file along GFFONTS, then TEXFONTS; a
PK file along PKFONTS, then TEXPKS, then TEXFONTS.
See Path specifications (Kpathsea library), for details of interpretation of environment variable values.
Naming font files has always been a difficult proposition at best. On the one hand, the names should be as portable as possible, so the fonts themselves can be used on almost any platform. On the other hand, the names should be as descriptive and comprehensive as possible. The best compromise we have been able to work out is described in a separate document: Introduction (Filenames for TeX fonts). See Archives, for where to obtain.
Filenames for GNU project fonts should start with `g', for the “source” abbreviation of “GNU”.
Aside from a general font naming scheme, when developing fonts you must keep the different versions straight. We do this by appending a “version letter” `a', `b', ... to the main bitmap filename. For example, the original Garamond roman font we scanned was a 30 point size, so the main filename was ggmr30 (`g' for GNU, `gm' for Garamond, `r' for roman). As we ran the font through the various programs, we named the output ggmr30b, ggmr30c, and so on.
Since the outline fonts produced by BZRto are scalable, we do not include the design size in their names. (BZRto removes a trailing number from the input name by default.)
(This chapter is adapted from the analogous one in the GCC manual, written by Richard Stallman.)
Your bug reports are essential in making these programs reliable.
Reporting a bug may help you by bringing a solution to your problem, or it may not. (If it does not, look in the service directory, which is part of the GNU CC and GNU Emacs distributions.) In any case, the principal function of a bug report is to help the entire community by making the next release work better.
Send bug reports for the GNU font utilities, or for their documentation,
to the address bug-gnu-utils@prep.ai.mit.edu. We also welcome
suggestions for improvements, no matter how small.
In order for a bug report to serve its purpose, you must include the information that makes for fixing the bug, as described below.
Thanks (in advance)!
If you are not sure whether you have found a bug, here are some guidelines:
bzrto -mf, that is a bug. You can run the TeX utility
programs GFtype and TFtoPL to check the validity of a GF or TFM file.
The purpose of a bug report is to enable someone to fix the bug if it is not known. It isn't important what happens if the bug is already known. Therefore, always write your bug reports on the assumption that the bug is not known.
Sometimes people give a few sketchy facts and ask, “Does this ring a bell?” or “Should this be happening?” This cannot help us fix a bug, so it is basically useless. We can only respond by asking for the details below, so we can investigate. You might as well expedite matters by sending them to begin with.
Try to make your bug report self-contained. If we ask you for more information, it is best if you include all the original information in your response, as well as the new information. We might have discarded the previous message, or even if we haven't, it takes us time to search for it. Similarly, if you've reported bugs before, it is still best to send all the information; we can't possibly remember what environment everyone uses!
To enable us to fix a bug, please include all the information below. If the bug was in compilation or installation, as opposed to in actually running one of the programs, the last two items are irrelevant. But in that case, please also make sure it is not a known problem before reporting it. See Problems.
You should include all of the following in your bug report:
Bugs typically apply to a single character in a font; you can find out what character is being processed with the `-verbose' option. It should then be straightforward to cut that single character out of the font with either the `-range' option and/or the `fontconvert' program, to make a new (very small) font. It is easier for us to deal with small files.
But if you don't want to take the time to break up the font, please send in the bug report anyway (with the entire font). We much prefer that to you not reporting the bug at all!
In other words, we need enough information so that we can run the offending program under the debugger, so we can find out what's happening. Without all the command-line arguments, or the input file in question, we cannot do this. Since you must have found the bug by running the program with a particular set of options and on a particular input file, you already have this information; all you need to do is send it!
Here are some things that are not necessary to include in a bug report.
Often people who encounter a bug spend a lot of time investigating which changes to the input file or command-line options will make the bug go away and which changes will not affect it.
This is often time consuming and not very useful, because the way we will find the bug is by running a single example under the debugger with breakpoints, not by pure deduction from a series of examples. You might as well save your time for something else.
A patch for the bug is useful if it is a good one. But don't omit the necessary information, such as the test case, on the assumption that a patch is all we need. We might see problems with your patch and decide to fix the problem another way, or we might not understand the patch at all. Without an example, we won't be able to verify that the bug is fixed.
Also, if we can't understand what bug you are trying to fix, or why your patch should be an improvement, we won't install it. A test case will help us to understand.
See Sending Patches for GNU CC (GCC Manual), for more details on the best way to write changes.
Such guesses are not useful, and often wrong. It is impossible to guess correctly without using the debugger to find the facts, so you might as well save your imagination for other things!
It is just as important to report bugs in the documentation as in the programs. If you want to do something using these programs, and reading the manual doesn't tell you how, that is probably a bug. In fact, the best way to report it is something like: “I want to do x; I looked in the manual in sections a and b, but they didn't explain it.”
If your bug report makes it clear that you've actually made an attempt to find the answers using the manual, we will be much more likely to take action (since we won't have to search the manual ourselves).
These programs use various data files to specify font encodings, auxliary information for a font, and other things. Some of these data files are distributed in the directory data; others must be constructed on a font-by-font basis.
If the environment variable FONTUTIL_LIB is set, data files are
looked up along the path it specifies, using the same algorithm as is
used for font searching (see Font searching). Otherwise, the
default path is set in the top-level Makefile.
The following sections (in other chapters of the manual) also describe file formats:
For the sake of brevity, we do not spell out every abbreviation (typically of file format names) in the manual every time we use it. This section collects and defines all the common abbreviations we use.
eexec-encrypted Type 1
font.
Data files read by these programs are text files that share certain syntax elements:
isspace) are ignored at the beginning of
a line.
A line can be as long as you want.
The encoding of a font specifies the mapping from character codes (an integer, typically between zero and 255) to the characters themselves; e.g., does a character with code 92 wind up printing as a backslash (as it does under the ASCII encoding) or as a double left quote (as it does under the most common TeX font encoding)? Put another way, the encoding is the arrangement of the characters in the font.
It is sad but true that no single encoding has been widely adopted, even for basic text fonts. (Text fonts and, say, math fonts or symbol fonts will clearly have different encodings.) Every typesetting program and/or font source seems to come up with a new encoding; GNU is no exception (see below). Therefore, when you decide on the encoding for the fonts you create, you should choose whatever is most convenient for the typesetting programs you intend to run it with. (Decent typesetting systems would make it trivial to set font encodings; unfortunately, almost nothing is decent in that regard!)
The encoding file format we invented is a font-format-independent
representation of an encoding. Encoding files are “data files” which
have the basic syntax elements described above (see Common file syntax). They are usually named with the extension .enc.
The first nonblank non-comment line in an encoding file is a string to put into TFM files as the “coding scheme” to describe the encoding; some common coding schemes are `TeX text', `TeX math symbol', `Adobe standard'. Case is irrelevant; that is, any programs which use the coding scheme should pay no attention to its case.
Thereafter, each nonblank non-comment line defines the character for the corresponding code: the first such line defines the character with code zero, the next with code one, and so on.
Each character consists of a name, optionally followed by ligature information. (All fonts using the same encoding should have the same ligatures, it seems to us.)
The character name in an encoding file is an arbitrary sequence of
nonblank characters (except it can't include a %, since that
starts a comment). Conventionally, it consists of only lowercase
letters, except where an uppercase letter is actually involved. (For
example, eacute is a lowercase e with an acute accent;
Eacute is an uppercase E with an acute accent.
If a character code has no equivalent character in the font, i.e., the
font table has a “blank spot”, you should use the name .notdef
for that code. This is the only name you can usefully give more than
once. If any other name is used more than once, the results are
undefined.
To avoid unnecessary proliferation of character names, you should use names from existing .enc files where possible. All the .enc files we have created are distributed in the data directory.
The ligature information for a character in an encoding file is optional. More than one ligature specification may be given. Each specification looks like:
lig second-char =: lig-char
This means that a ligature character lig-char should be present in the font for the current character (the one being defined on this line of the encoding file) followed by second-char. You give second-char and lig-char as character codes (see Specifying character codes). For example, in most text encodings (which involve Latin characters), some variation on the following line will be present:
f lig f =: 013 lig i =: 014 lig l =: 015
This will produce a ligature in the font such that when a typesetting program sees the two character sequence `ff' in the input, it replaces those two characters in the output with the single character at position octal 13 (presumably the `fi' ligature) of the font; when it sees `fi', the character at position octal 14 is output; when it sees `fl', the character at position octal 15 is output.
Metafont version 2 allows a more general ligature scheme; if there is a demand for it, it wouldn't be hard to add.
When we started making fonts for the GNU project, we had to decide on some font encoding. We hoped to use an existing one, but none that we found seemed suitable: the TeX font encodings, including the “Cork encoding” described in TUGboat 11#4, lacked many standard PostScript characters; conversely, the standard PostScript encodings lacked useful TeX characters. Since we knew that Ghostscript and TeX would be the two main applications using the fonts, we thought it unacceptable to favor one at the expense of the other.
Therefore, we invented two new encodings. The first one, “GNU Latin text” (distributed in data/gnulatin.enc), is based on ISO Latin 1, and is close to a superset of both the basic TeX text encoding and the Adobe standard text encoding. We felt it was best to use ISO Latin 1 as the foundation, since some existing systems actually use ISO Latin 1 instead of ASCII. We also left the first eight positions open, so particular fonts could add more ligatures or other unusual characters.
The second, “GNU Latin text complement” (distributed in data/gnulcomp.enc), includes the remaining pre-accented characters from the Cork encoding, the PostScript expert encoding, swash characters, small caps, etc.
When a program reads a TFM file, it's given an arbitrary string (at best) for the coding scheme. To be useful, it needs to find the corresponding encoding file. We couldn't think of any way to name our .enc files that would allow the filename to be guessed automatically. Therefore, we invented another data file which maps the TFM coding scheme strings to our .enc filenames.
This file is distributed as data/encoding.map. See Common file syntax, for a description of the common syntax elements.
Each nonblank non-comment line in encoding.map has two entries: the first word (contiguous nonblank characters) is the .enc filename; the rest of the line, after ignoring whitespace, is the string in the TFM file. This should be the same string that appears on the first line of the .enc file (see Encoding files).
Programs should ignore case when using the coding scheme string.
Here is the coding scheme map file we distribute:
adobestd Adobe standard
ascii ASCII
dvips dvips
dvips TeX text + adobestandardencoding
gnulatin GNU Latin text
gnulcomp GNU Latin text complement
psymbol PostScript Symbol
texlatin Extended TeX Latin
textext TeX text
zdingbat Zapf Dingbats
Imageto converts an image file (currently either in portable bitmap format (PBM) or GEM's IMG format) to either a bitmap font or an Encapsulated PostScript file (EPSF). An image file is simply a large bitmap.
If the output is a font, it can be constructed either by outputting a constant number of scanlines from the image as each “character” or (more usually) by extracting the “real” characters from the image.
The current selection of input formats is rather arbitrary. We implemented the IMG format because that is what our scanner outputs, and the PBM format because Ghostscript can output it (see GSrenderfont). Other formats could easily be added.
Usually there are two prerequisites to extracting a usable font from an image file. First, looking at the image, so you can see what you've got. Second, preparing the IFI file describing the contents of the image: the character codes to output, any baseline adjustment (as for, e.g., `j'), and how many pieces each character has. Each is a separate invocation of Imageto; the first time with either the `-strips' or `-epsf' option, the second time with neither.
In the second step, Imageto considers the input image as a series of image rows. Each image row consists of all the scanlines between a nonblank scanline and the next entirely blank scanline. (A scanline is a single horizontal row of pixels in the image.) Within each image row, Imageto looks top-to-bottom, left-to-right, for bounding boxes: closed contours, i.e., an area whose edge you can trace with a pencil without lifting it.
For example, in the following image Imageto would find two image rows, the first from scanlines 1 to scanline 7, the second consisting of only scanline 10. There are six bounding boxes in the first image row, only one in the second. (This example also shows some typical problems in scanned images: the baseline of the `m' is not aligned with those of the `i', `j', and `l'; a meaningless black line is present; the `i' and `j' overlap.)
01234567890123456789
0
1 x
2 x x x
3 x
4 x x x xxxxx
5 x x x x x x
6 x x x x
7 xx
8
9
10 xxxxxxxxxxxxxxx
Typically, the first step in extracting a font from an image is to see exactly what is in the image. (Clearly, this is unnecessary if you already know what your image file contains.)
The simplest way to get a look at the image file, if you have Ghostscript or some other suitable PostScript interpreter, is to convert the image file into an EPSF file with the `-epsf' option. Here is a possible invocation:
imageto -epsf ggmr.img
Here we read an input file ggmr.img; the output is ggmr.eps. You can then view the EPS file with
gs ggmr.eps
(presuming that gs invokes your PostScript interpreter).
If you don't have both a suitable PostScript interpreter and enough disk space to store the EPS file (it uses approximately twice as much disk space as the original image), the above won't work. Instead, to view the image you must make a font with the `-strips' option:
imageto -strips ggmr.img
The output of this will be ggmrsp.1200gf (our image having a resolution of 1200 dpi). Although the GF font cannot be conveniently viewed directly, you can use TeX and your favorite DVI processor to look at it, as follows:
fontconvert -tfm ggmrsp.1200
echo ggmrsp | tex strips
This outputs in strips.dvi, which you can view with your favorite DVI driver. (See Archives, for how to obtain the DVI drivers for PostScript and X we recommend.)
strips.tex is distributed in the imageto directory.
Once you can see what is in the image, the next step is to prepare the IFI file (see IFI files) corresponding to its characters. Imageto relies completely on the IFI files to describe the image; it makes no attempt at optical character recognition, i.e., guessing what the characters are from their shapes.
You must also decide on a few more aspects of the output font, which you specify with options:
For instance, in the example image in Imageto usage, it would be best to specify `-baselines=2,0'. The `2' is scanline #5 in that image. The `0' is an arbitrary value for scanline #10, which we will ignore via the IFI file (see IFI files).
For each character written, the `-print-guidelines' option produces output on the terminal that looks like:
75 (K) 5/315
This means that character code 75, whose name in the encoding file is `K', has its bottom row at row 5, and its top row at row 315; i.e., the character has five blank rows above the origin. This is almost certainly wrong (the letter `K' should sit on the typesetting baseline), so we would want to adjust the baseline upwards to 0 via an individual character baseline adjustment of 5 in the IFI file (see IFI files).
The final invocation to produce the font might look something like this:
imageto -baselines=121,130,120 -designsize=26 ggmr
The output from this would be ggmr26.1200gf.
Your image may not be completely “clean”, i.e., the scanning process may have introduced artifacts: black lines at the edge of the paper; blotches where the original had a speck of dirt or ink; broken lines where the image had a continuous line. To get a correct output font, you must correct these problems.
To remove blotches, you can simply put .notdef in the appropriate
place in the IFI file. You can find the “appropriate place” when you
look at the output font; some character will be nothing but a (possibly
tiny) speck, and all the characters following will be in the wrong
position.
The `-print-clean-info' option might also help you to diagnose which bounding boxes are being assigned to which characters, when you are in doubt. Here is an example of its output:
[Cleaning 149x383 bitmap:
checking (0
checking (0
checking (0
checking (113
106]
The final `106' is the character code output (ASCII `j'). The size of the overall bitmap which contains the `j' is 149 pixels wide and 383 pixels high. The bitmap contained four bounding boxes, the last two of which belonged to the `j' and were kept, and the first two from the adjacent character (`i') and were erased. (As shown in the example image above, the tail of the `j' often overlaps the `i' in type specimens.)
If the image has blobs you have not removed with .notdef, you
will see a small bounding box in this output. The numbers shown are