[pdftex] OT: Unicode and typesetting

Harald Hanche-Olsen hanche at math.ntnu.no
Fri Apr 8 16:18:00 CEST 2005


+ Michael Chapman <chapman at mchapman.com>:

| The Angstrom sign (&Acirc: in HTML speak, I think)

That's Å, for Ångström, yes.

| has its own code point (x212B), different to 'A with ring above'
| (xC5) (let alone the fact that you can 'build your own' glyphs: x41
| x30A).  That there is an Angstrom sign code point (and a degrees
| Celsius (x2103) and degrees Fahrenheit (x2109)) is a boon for text
| searching. One can find all the measurements in Angstroms in a text,
| even if that text is in a language that uses circles on top of
| vowels.  But being able to search for kilometres (let alone metres:
| 'm') would be equally (if not more) useful.

For this reason, I believe the inclusion of the Ångström, Kelvin and a
few other symbols were a mistake in an early version of Unicode that
we're now stuck with for eternity.  Actually, if you look in U2100.pdf
(from the Unicode web site) you will find this comment at 212B
ANGSTROM SIGN: "preferred representation is 00C5 Å".  But strangely,
no similar comment for 212A KELVIN SIGN.

| It is not even as if x212B is some kind of symbolic link to xC5 for legacy 
| purposes. There are two distinct code points.

But, I think it *is* some kind of symbolic link.
UnicodeData.txt says:

212B;ANGSTROM SIGN;Lu;0;L;00C5;;;;N;ANGSTROM UNIT;;;00E5;

that 00C5 in there is the "link" to

00C5;LATIN CAPITAL LETTER A WITH RING ABOVE;Lu;0;L;0041 030A;;;;N;LATIN CAPITAL LETTER A RING;;;00E5;

(I hope the mailing list software doesn't wrap that line)

and here we find a "link" 0041 030A to

0041;LATIN CAPITAL LETTER A;Lu;0;L;;;;;N;;;;0061;
030A;COMBINING RING ABOVE;Mn;230;NSM;;;;;N;NON-SPACING RING ABOVE;;;;

and there the linking stops.  I think the moral is that searching in
Unicode text is not easy.  You basically need to find the the
canonical decomposition of everything first.  Just reading about it
gives me a headache, but then I am not a Unicode specialist.  Which is
why I'd better stop while the going is good(?).

- Harald



More information about the pdftex mailing list