[tex4ht] oolatex odt needs cleanup

Martin Weis martin.weis.newsadress at gmx.de
Sat Jun 5 19:40:04 CEST 2010


Dear tex4ht-community!

I find tex4ht a very useful tool, I use especially oolatex often. There
I found some unexpected behaviour/markup in the resulting final odt,
which somebody might be able to explain?
My version of tex4ht is: tex4ht.c (2009-01-31-07:33 kpathsea)

Here is a minimal example to demonstrate:

\documentclass[a4paper,10pt]{article}
\begin{filecontents}{bibliography.bib}
@Article{author2010,
 author = {First Author and Second Authorsname},
 title = {Title of the cited article},
 journal = {Journal of Test},
 year = {2010},
 volume = {20},
 number = {1},
 pages = {23-42}
}
\end{filecontents}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage[english]{babel}
\usepackage[T1]{fontenc}
\usepackage{graphicx}

\usepackage{hyperref}

\date{\today}
\title{Title of the article}
\author{First Author\thanks{Affiliation of first author} \and Second
Authorsname\thanks{Affiliation of second author}}
\begin{document}
 \maketitle
\section{Introduction}
\label{sec:intro}
This is the introduction, we refer here to the conclusions in
section~\ref{sec:conclusion}. And we would like to cite \cite{author2010}.

Our most used equation~\ref{eq:pythagoras} was invented by Phytagoras.
\begin{equation}
c^{2}=a^{2}+b^{2}
\label{eq:pythagoras}
\end{equation}
which can also be written as $c=\sqrt{a^{2}+b^{2}}$.

\section{Conclusion}
\label{sec:conclusion}
Since we did not introduce anything in section~\ref{sec:intro}, there
are not many conclusions to find here.

\bibliographystyle{plain}
\bibliography{bibliography}

\end{document}

In the resulting odt there are links (for \ref and \cite commmands), but
they are with an additional space. This applies to inline formulas, too.
In the example:
> This is the introduction, we refer here to the conclusions in section
> 2 . And we would like to cite [1] .
            --^                             --^
> Our most used equation 1  was invented by Phytagoras.
                        --^
which can also be written as [formula] .
                                    --^

In the content.xml (unzip the odt with unzip -d odt_unzipped
example.odt) the following xml snippet can be found (with original
linebreaks, sorry for the long lines):

> <text:p text:style-name="First-line-indent">   Our most used equation<text:s/>1<!--tex4ht:ref: eq:pythagoras 
> --><text:span text:style-name="reference-ref"><text:reference-ref text:ref-name="x1-1001r1" text:reference-format="text"> </text:reference-ref></text:span> was invented by Phytagoras. </text:p> 
> <table:table table:style-name="equation"><table:table-column table:style-name="equ-col"/> 
> <table:table-column table:style-name="equ-num-col"/> 
> <table:table-row><table:table-cell table:style-name="equ-cell"><text:p text:style-name="equ-p"><text:reference-mark text:name="x1-1001r1"> </text:reference-mark>
> <!--l. 34
> --><draw:frame draw:name="mobj-4" draw:style-name="mml-display" draw:z-index="0" text:anchor-type="paragraph"><draw:object xlink:actuate="onLoad" xlink:href="./odtclean-m4" xlink:show="embed" xlink:type="simple"/></draw:frame> </text:p></table:table-cell> 
> <table:table-cell table:style-name="equ-num-cell"><text:p text:style-name="equ-num-p">(1)</text:p></table:table-cell></table:table-row></table:table>
> <!--l. 37
> --><text:p text:style-name="Like-Text-body">
> which can also be written as <!--l. 38
> --><draw:frame draw:style-name="mml-inline" draw:name="mobj-5" text:anchor-type="as-char" draw:z-index="0"><draw:object xlink:href="./odtclean-m5" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/></draw:frame> .
>    </text:p> 

where the spaces can be found between the "text:reference-ref" tags:
text:reference-format="text"> </text:reference-ref>
and after
</draw:frame> .

There might be some more (e.g. before 'Our' and </text:p>), but these
seem to be interpreted well at least by OpenOffice.org.

I use this sed script to clean up the content.xml:

#!/bin/sed -f
# cleanup the spaces for refs
s#text:reference-format="text"> <#text:reference-format="text"><#g
# cleanup the additional spaces after displaymath env.
s#</draw:frame> #</draw:frame>#g
s#</draw:frame>[\n]#</draw:frame>#g

Additionally the footnotes of the authors affiliation are at the wrong
position.

If anybody can explain or change this behaviour, I would be glad.

Thanks,
-- 
Martin Weis


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://tug.org/pipermail/tex4ht/attachments/20100605/3f80616e/attachment.bin>


More information about the tex4ht mailing list