[tex4ht] htxelatex support for unicode and multiple scripts

Michal Hoftich michal.h21 at gmail.com
Mon Feb 18 18:17:03 CET 2013


Hello,

you can also use macro \DeclareUnicodeCharacter from the `inputenc`
package. It has two parameters, first is hex value of utf8 character,
second is macro to be used, in this case \entity{decimal value of utf8
character}.

I created simple package, `greek-arabic-4ht.sty`, which covers full unicode
range for greek, arabic and extended latin A, but not  all characters used
in your document are covered!! You can simply add them to the package as
you find any such undeclared character.

You can edit your preamble to include packages needed when tex4ht is
running like this:

--------------------
\documentclass[12pt]{memoir}

\makeatletter
\@ifpackageloaded{tex4ht}{%
\newcommand{\greek}[1]{#1}
\usepackage{greek-arabic-4ht}
}{%
\usepackage{fontspec}
\usepackage{xunicode}
% Choose roman font (choosing the mapping so that ``--$>$``, '--$>$' etc.).
\setromanfont[Mapping=tex-text]{Palatino}
% Greek (normally, use first two lines; to make simple file for export to
Word, use 3rd line only)
\newfontfamily{\gr}{New Athena Unicode}
\newcommand{\greek}[1]{{\gr #1}}
\newfontfamily\arabicfont[Script=Arabic,Scale=1.2,WordSpace=2]{USAMA NASKH}
\usepackage{bidi}
}
\makeatother
------------------

If your target format is word, you can translate your document with command:

mk4ht oolatex alex "xhtml, charset=utf-8"  -utf8

this will make file in openoffice format, which can be easily translated to
word. Sample is also included in the attachment.

Regards,
Michal




2013/2/18 Radhakrishnan CV <cvr at river-valley.org>

> On Sun, Feb 17, 2013 at 4:25 AM, Alexandre Roberts <
> alexandre.roberts at gmail.com> wrote:
>
>> Dear tex4ht list members,
>>
>> I am about to begin drafting the first chapter of my dissertation in
>> Byzantine and Middle Eastern history. This is the moment when I will commit
>> to the format I will use for writing my entire dissertation. I want it to
>> be XeLaTeX/BibLaTeX, but unless I can come up with a simple workflow for
>> converting the content of my documents to Word format -- the only format
>> that publishers in my field accept -- I will have to give this up and turn
>> to Word/Endnote or Mellel/Bookends for the next three years.
>>
>
> As far as I understand, TeX4ht won't support fontspec or XeLaTeX
> technologies of using system fonts that do not have *.tfm's. In effect, by
> adopting TeX4ht, one is likely to loose the features brought in by XeTeX.
> However, here is another approach.
>
>    1. We translate all the Unicode character representations in the
>    document to Unicode code points in 7bit ascii which is very much palatable
>    to TeX4ht. A simple perl script, etf2ent.pl in the attached archive
>    does the job.
>    2. We run TeX4ht on the output of step 1.
>    3. Open the *html in a browser, I believe, we get what you wanted. See
>    the attached screen shot as it appeared in Firefox in my Linux box.
>
> Here is what I did with your specimen document.
>
>    1. commented out lines that related to fontspec package from your
>    sources named as alex.tex.
>    2. added four lines of macro code to digest the converted TeX sources
>    3. ran the command: perl utf2ent.pl alex.tex > alex-ent.tex
>    4. ran the command: htlatex alex-ent "xhtml,charset=utf-8,fn-in"
>    -utf8  (fn-in option is to keep the footnotes in the same document). I have
>    used a local bib file, mn.bib as I didn't have your bib database. biber was
>    also run in the meantime to process the bibliography database.
>    5. open the output, alex-ent.html in a browser. I got it as you see in
>    the attached alex.png.
>
> Hope this might help you.
> Best regards
>
> --
> Radhakrishnan
> River Valley<https://maps.google.com/maps?q=River%20Valley,%20Thiruvananthapuram%20Neyyardam%20Road,%20Kerala,%20India&vector=1>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/c3c5c570/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: greekarabix4ht.zip
Type: application/zip
Size: 10881 bytes
Desc: not available
URL: <http://tug.org/pipermail/tex4ht/attachments/20130218/c3c5c570/attachment.zip>


More information about the tex4ht mailing list