[texhax] extracting a plain text file of the final document
sgovindachar at yahoo.com
Sun Jan 15 06:02:21 CET 2012
On Saturday, January 14, 2012, Reinhard Kotucha wrote:
> On 2012-01-14 at 18:00:58 -0800, jtzzaa11-texhax2 at yahoo.com wrote:
>> Is there a way to obtain a plain text version of the final
>> document processed by latex?. By final version I mean, where
>> all macros, labels, and bibliography entries have been
>> processed and assigned their "final" values.
> pdftotext -layout <your PDF file>
> do what you need?
> pdftotext is part of xpdf. If you don't have it, just install
> TeX Live provides xpdf (and thus pdftotext) for Windows.
As per this page on ctan.org, Tex Live does _not_ provide xpdf
(and neither does MikTex): http://www.ctan.org/pkg/xpdf
I googled and got it from www.foolabs.com -- and was very happy
to find that words in the pdf file with ligatures, like "first"
with the "fi" ligature, were properly converted to "first" in
the text file (which is unlike what Adobe's "save as text file"
feature is capable of doing).
> For other systems I recommend to consult the dedicated package
> manager. xpdf is ubiquitous.
>> Of course, such a plain text version won't contain any floats.
> It depends. It's a matter of fact that a JPEG can't be
> represented as plain text, but if the float is a table, it will
> appear in the output.
> Furthermore I assume that pdftotext provides best output if
> you're in an environment which supports UTF-8.
More information about the texhax