[tex4ht] unicode and lualatex

Ulrike Fischer news3 at nililand.de
Sat Jul 23 12:58:12 CEST 2011


Am Sat, 23 Jul 2011 03:25:03 -0700 schrieb Johannes Wilm:



>> An attachment named xhluatex.bat was removed from this document 

Well now we know. Next time I will hide .bat-files in .zip-files;

> Could you send me the attachment off-list or paste the contents into an
> email?

Here. Pay attention to line break. After thinking a bit I think the
batch should better be call "htlualatex.bat" and if you want xhtml
you should call 

htlualatex.bat unicode "xhtml,charset=utf-8" "-cunihtf -utf8"

Roughly the batch file is htlatex.bat. I removed the pathes to the
env-file (unneeded for miktex), changed %3 to %~3 (%3 is simply
wrong) and added a --jobname--option (don't remember why but
probably luatex got confused by the code and didn't set jobname
correctly).

  dvilualatex --jobname=%1 %5
\makeatletter\def\HCode{\futurelet\HCode\HChar}\def\HChar{\ifx"\HCode\def\HCode"##1"{\Link##1}\expandafter\HCode\else\expandafter\Link\fi}\def\Link#1.a.b.c.{\g at addto@macro\@documentclasshook{\RequirePackage[#1,xhtml]{tex4ht}}\let\HCode\documentstyle\def\documentstyle{\let\documentstyle\HCode\expandafter\def\csname
tex4ht\endcsname{#1,html}\def\HCode####1{\documentstyle[tex4ht,}\@ifnextchar[{\HCode}{\documentstyle[tex4ht]}}}\makeatother\HCode
%2.a.b.c.\input  %1

  dvilualatex --jobname=%1 %5
\makeatletter\def\HCode{\futurelet\HCode\HChar}\def\HChar{\ifx"\HCode\def\HCode"##1"{\Link##1}\expandafter\HCode\else\expandafter\Link\fi}\def\Link#1.a.b.c.{\g at addto@macro\@documentclasshook{\RequirePackage[#1,xhtml]{tex4ht}}\let\HCode\documentstyle\def\documentstyle{\let\documentstyle\HCode\expandafter\def\csname
tex4ht\endcsname{#1,html}\def\HCode####1{\documentstyle[tex4ht,}\@ifnextchar[{\HCode}{\documentstyle[tex4ht]}}}\makeatother\HCode
%2.a.b.c.\input  %1

  dvilualatex --jobname=%1 %5
\makeatletter\def\HCode{\futurelet\HCode\HChar}\def\HChar{\ifx"\HCode\def\HCode"##1"{\Link##1}\expandafter\HCode\else\expandafter\Link\fi}\def\Link#1.a.b.c.{\g at addto@macro\@documentclasshook{\RequirePackage[#1,xhtml]{tex4ht}}\let\HCode\documentstyle\def\documentstyle{\let\documentstyle\HCode\expandafter\def\csname
tex4ht\endcsname{#1,html}\def\HCode####1{\documentstyle[tex4ht,}\@ifnextchar[{\HCode}{\documentstyle[tex4ht]}}}\makeatother\HCode
%2.a.b.c.\input  %1

  tex4ht  %1  %~3

  t4ht   %1 %4


>> Your main problem has nothing to do with tex4ht. While luatex can
>> handle utf8 *input* natively it has problems to output
>> non-ascii-chars without fontspec and "unicode fonts" on the output
>> side.
>>
>> Your document is using OT1-encoded fonts (which has 128 characters)
>> and so your non-ascii-chars are ending in nothingness. With
>> \usepackage[T1]{fontenc} result will be better but quite a lot chars
>> will be wrong (e.g. the german ك)
>>
>>
> Oh, I thought I could use at least the first 256 characters. 128 is a bit
> limited for sure.

Even with 128 character per font TeX can print thousands of symbols:
You only need to switch to other fonts. But if you use a font with
128 character you should not try to print the non-exisiting char 156
- and that is what happening without inputenc.

> 
> btw -- would it then make sense to auto-replace the characters in question
> before and after the transition? I am thinking of:
> 
> *cp unicode.tex /tmp*
> *cd /tmp*
> *rpl "ü" "ue5394" unicode.tex*
> *dvilualatex...*
> *.... *
> *rpl "ue5394" "ü" unicode.html*
> 
> in which 5394 just is a random number so that I don't catch other instances
> of "ue" when converting back. Hyphenation isn't applied, so it seems that
> this would work, right?

I don't think that is necessary. I get correct utf8-chars. 


-- 
Ulrike Fischer 



More information about the tex4ht mailing list