[tex4ht] two versions of unicode.4ht

Michal Hoftich michal.h21 at gmail.com
Wed Aug 3 17:50:12 CEST 2016


Hi Ulrike,

> I found two versions of unicode.4ht in 
> 
> \ht-fonts\iso8859\1 
> 
> one in
> 
>   D:    exlive\2016    exmf-dist    ex4ht\ht-fonts\iso8859\1\charset\
> 
> the other in
> 
>   D:    exlive\2016    exmf-dist    ex4ht\ht-fonts\iso8859\1\charset\uni\
> 
> Their content is not identical, the one in charset has two extra
> lines:
> 
> 'fi' ''  'fi'       ''
> 'fl' ''  'fl'       ''
> 
> I'm not quite sure if both are really from the texlive installation
> -- perhaps one of them remained from a test I did to compare the
> location with the one from miktex, but I mention it anyway just in
> case. Also I would like to know which one is the correct one. 

There is quite a lot of unicode.4hf versions generated from
tex4ht-fonts-4hf.tex:

tex4ht.dir/texmf/tex4ht/ht-fonts/unicode/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/unicode/html/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/win/1251/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/utf8/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/gbk/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/symbol/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/viscii/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/viqr/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/html-speech/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/mozilla/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/mozilla/charset/mnemonic/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/mozilla/charset/native/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/cp1256/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/ooffice/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/gb2312/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/koi/8r/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/2/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/2/html/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/5/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/5/html/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/1/charset/uni/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/1/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/1/html/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/6/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/6/html/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/7/charset/unicode.4hf
tex4ht.dir/texmf/tex4ht/ht-fonts/jsml/charset/unicode.4hf

if I understand it correctly, tex4ht search directories specified in
font sections in tex4ht.env. These sections can be selected with `-c`
option for tex4ht command. Default section is 

<default>
i~/tex4ht.dir/texmf/tex4ht/ht-fonts/iso8859/1/!
i~/tex4ht.dir/texmf/tex4ht/ht-fonts/ascii/!
i~/tex4ht.dir/texmf/tex4ht/ht-fonts/alias/!
i~/tex4ht.dir/texmf/tex4ht/ht-fonts/mozilla/!
i~/tex4ht.dir/texmf/tex4ht/ht-fonts/unicode/!
</default>

I think that first unicode.4hf file which is found is used, but I am not
sure the mechanism which is used for location.  By default,

ht-fonts/iso8859/1/html/charset/unicode.4hf

is used on my machine, with -cunihtf it is 

ht-fonts/unicode/html/charset/unicode.4hf

I don't understand why 

ht-fonts/unicode/charset/unicode.4hf

is not used instead.

Anyway, unicode.4hf files are important for output encodings different
that utf-8, as they specify charcodes to which should be unicode
entities specified in the DVI file transformed. With -utf8 option,
tex4ht output in utf-8 encoding and unicode.4hf is used only to output
some characters as named entities for example, or "fi" ligature as
literal "fi".

It seems that issue someone had on TeX.sx with Miktex [1] is that wrong
`unicode.4hf` file is used, it can't find the one in `unicode` dir and
instead the one in `iso8859/1` is used, which results in file with
declared `utf-8` encoding, but characters in `iso8859` encoding.

I am not sure what is the issue here. It seems that the .4hf files are
in correct places, but tex4ht can't find them.

Last thing that I've found is that we don't generate the .4hf files from
the sources at the moment, there is no target for tex4ht-fonts-4hf.tex
in the Makefile. It is also so huge file, that the compilation fails
with capacity exceeded. It can be compiled with LuaLaTeX though. The
generated files seems to be incorrect, as they include copyright notice
at the beginning and tex4ht complains about incorrect entries.

Best regards,
Michal

[1] http://tex.stackexchange.com/q/322164/2891



More information about the tex4ht mailing list