[texhax] more on math rendering for the web (including Microsoft Word Symbol font and TeX for web)

Brandon Kuczenski brandon at 301south.net
Mon Jun 28 06:09:07 CEST 2010


On Sun, 27 Jun 2010, Dan Doernberg wrote:

> Reinhrad, Brandon
>
> I'm following up on our email from last week where you graciously answer 
> several of our questions. We have been trying to do more research 
> ourselves so we can ask as intelligent questions as possible, but 
> dealing with Word's Symbol font is apparently a very tough problem and 
> in an area we are not experts in.
>
> A. I'd like for our software to be able to handle TeX documents. Two 
> general questions come to mind (presumably easy ones, so I haven't tried 
> to research TeX before asking):
>
>   1. Would it be trivial or difficult for our software to render TeX 
> input documents for the Web? Would LaTeX and/or other variants be the 
> same?
>

For simple documents, this should be straightforward.. but TeX is very 
complicated and LaTeX has tons of specialized packages, so getting the 
details to work can be problematic.  The old standby is latex2html but it 
seems to be unmaintained: http://www.latex2html.org/


There are plenty of programs which will render TeX fragments on-demand on 
the server side.  All you need is a local TeX distribution (texlive is 
common on linux systems; MikTeX on Windows; MacTeX on mac) and something 
like:

http://www.mayer.dial.pipex.com/tex.htm

to render for the web. [this looks like a port to rails:
http://agilewebdevelopment.com/plugins/latex_render_helper ]

>   2. Does TeX have any built-in translaters for dealing with legacy 
> documents from MS Word? What do other people do when confronted with 
> problematic Word documents????


I don't think TeX has any facility to deal with word, but there are other 
programs.  wv (for word viewer) looks promising:

http://wvware.sourceforge.net/

Also have a look at AbiWord, which is a full word processor (and 
commandline tool) and uses wvware for its word conversions. I just tried 
"abiword --to=html my_file.doc" on a word-2003 doc and got satisfactory 
results at the commandline.  It appears to render to unicode and the few 
symbols I had in my document came out correctly.  The tables looked like 
garbage though.

http://www.abisource.com/

I use catdoc to extract 'content' from .doc files for use in TeX. 
According to the webpage, it "doesn't even try to preserve MS-Word 
character formatting" but it has the capability for user-defined 
substitution of special characters if they come out as unicode.

http://wagner.pp.ru/~vitus/software/catdoc/

As far as 'problematic' files go, I basically expect to have to go through 
each file and fix problems manually.  I just don't do it often enough to 
worry about.

I don't know how well any of these programs work with Office 2007/2008 or 
later files (.docx on windows).  My sense is that everything stopped 
working with the new formats.  (sometimes I can't even open them in word!)

Hope this helps,
Brandon


More information about the texhax mailing list