[tex4ht] curiosity about unicode.4hf

Matteo Gamboz gamboz at medialab.sissa.it
Mon Mar 13 14:10:22 CET 2017


Hi all

this is a bit similar to
http://tex.stackexchange.com/questions/328441/tex4ht-unicode-representations-of-apostrophe-in-utf-8-html-source
(please feel free to tell me to post on tex.stackexchange)


I have a curiosity about a unicode entity.

Here is the situation: when I take a tex file such as the following
cat > a.tex <<EOF;
\documentclass{article}
\begin{document}
'
\end{document}
EOF


an run it through
htlatex a "xhtml" " -cunihtf -utf8"

I get "a.html" that contains:
...&#x2019;...

(where "ߣ" is the unicode node of "’")

This is because of the file
/usr/local/texlive/2016/texmf-dist/tex4ht/ht-fonts/unicode/charset/unicode.4hf
that contains lines to keep the following in unicode representations:
&#x003C; <
&#x003E; > 
&#x0022; "
&#x2019; ’
&#x0026; &

AFAIK, ' and " are illegal in attributes, but ’ and ‘ (#x2018) should
not be (and #x2018 is not in the file - texlive2016).

Does anyone know why &x2019; ended up in unicode.4hf?

Thanks all
m



More information about the tex4ht mailing list