[tex4ht] [bug #597] tex4ht + biblatex + non-ascii chars = mixed encoding in html file

Matteo Gamboz puszcza-hackers at gnu.org.ua
Sat Mar 4 15:01:23 CET 2023


URL:
  <http://puszcza.gnu.org.ua/bugs/?597>

                 Summary: tex4ht + biblatex + non-ascii chars = mixed encoding
in html file
                 Project: tex4ht
            Submitted by: gamboz
            Submitted on: Sat Mar  4 14:01:23 2023
                Category: None
                Priority: 5 - Normal
                Severity: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

NB: this is the same as https://tex.stackexchange.com/q/678200/56076
(sorry for cross-posting) :)

I have this situation
- a LaTeX file with a macro that is usually translated into a unicode char by
tex4ht (e.g. `\ldots` that became `…`)
- a citation with non-ascii char in the name of the author (e.g. the `í` in
`Albarracín`)
- I would like to generate an xhtml file with htlatex

The procedure works, but the resulting file has one char encoded in utf-8 (the
latex macro) and the non-ascii char in the author's name encoded in latin-1.
AFAICT, htlatex includes the bbl file reading it as if it was in latin-1.

Is there anything that I could do to fix this behavior? :)\
(I'm working on `pdfTeX, Version 3.141592653-2.6-1.40.24 (TeX Live 2022/Arch
Linux)`)


Here is a mwe, and below the commands that I run:

```latex
%% File mwe.tex
\documentclass{article}

\usepackage[backend=biber]{biblatex}

\begin{filecontents}{\jobname.bib}
@Article{Albarracin2000,
year = {2000},
volume = {1},
issue = {2},
pages = {3},
author = {Anyone Albarracín},
title = {A beautiful paper.},
journaltitle = {Some Journal}
}
\end{filecontents}

\addbibresource{\jobname.bib}

\begin{document}

I Am a Scientist\ldots\ Ask Me Anything
\parencite{Albarracin2000}

\printbibliography

\end{document}
```

```sh
htlatex mwe.tex "xhtml" "-cunihtf -utf8" "" ""
biber mwe
htlatex mwe.tex "xhtml" "-cunihtf -utf8" "" ""
```
and the result
```sh
$ file mwe.html
mwe.html: XML 1.0 document, Non-ISO extended-ASCII text
$ grep -a -e 'Anyone Albarra' -e Scientist --color mwe.html 
<!--l. 22--><p class="noindent" >I Am a Scientist… Ask Me Anything [<a 
    <!--l. 26--><p class="noindent" >Anyone Albarrac�n. “A beautiful
paper.” In: <span 

```





    _______________________________________________________

Reply to this item at:

  <http://puszcza.gnu.org.ua/bugs/?597>

_______________________________________________
  Message sent via/by Puszcza
  http://puszcza.gnu.org.ua/



More information about the tex4ht mailing list.