[tex4ht] [bug #241] grave accent letter ` (hex 60) changes to left single quotation mark (hex 0xE2 0x80 0x98)

Michal Hoftich michal.h21 at gmail.com
Mon Jan 19 15:56:04 CET 2015


t>
> I don't doubt it.  No .htf has been created (in the distribution anyway)
> since Eitan died.  It would be great to cover some of the new fonts.

yeah, many fonts are missing, Linux Libertine for example. if my
system will work, we may process all fonts in texmf tree. but manual
testing of the results will be needed I am afraid, and that would be
really huge task.

>
>     my idea is following: we can take property list of a tfm file
>
> I doubt the encoding info in the TFM file is especially reliable even in
> the few cases where it's present.  (Ditto afm2pl.)
>

it seems that best might be to use known encodings when present and
use afm file parsing in the other cases.


>     and find postscipt name of the character in corresponding .enc
>     file. we can get unicode code point for postscript name from
>     glyphlist.txt and texglyphlist.txt files included in TeX
>     distribution.
>
> Wow, quite a project.

I've already found fonts which use non standard glyph names (txsyc,
for example). so sometimes manual lookup for each character seem
necessary :(

>
>     for these FONTSPECIFIC I have to use
>     google to find out actually used encoding
>
> For fonts created through the otftotfm process, i.e., nearly everything
> that Michael Sharpe and Bob Tennent have done, who have contributed many
> of the new fonts (Sharpe did newtx), there should be an opaquely-named
> (a bunch of hex chars) .enc file in the font package corresponding to
> every tfm.  As I understand it.
>

thank, I will look at this.

> Anyway, in general, I expect that talking to the package developer or
> looking at the sources would be more fruitful than random web searches.
> (Not to say it'll be easy, no matter what.)

or manual looking for each character, as Eitan did.. But mistakes are
danger in such cases, as I've found German ß coded as beta in one htf
file.


>
>     but sometimes two or more glyphs are used to create character
>     (mainly accents), so we can't get post script name of such character
>     even if we knew encoding of referenced glyphs
>
> All I can think of is to have heuristics or a table saying that a
> composition of character X + character Y in font F means Unicode point
> U.  Since it's generally about accents, the combinations should be
> finite, and repeated through many different fonts.
>

I hope so
> Thanks,
> K

regards,
Michal



More information about the tex4ht mailing list