[texhax] extracting math from pdf file

Benjamin Sambale bsambale at gmx.de
Mon Dec 6 08:47:48 CET 2010


Am 06.12.2010 00:56, schrieb Heiko Oberdiek:
> On Sun, Dec 05, 2010 at 06:38:45PM -0400, Jim Diamond wrote
>> On Sun, Dec  5, 2010 at 22:46 (+0100), Benjamin Sambale wrote:
>>> \documentclass{minimal}
>>> \begin{document}
>>> $\ne$
>>> \end{document}
>>> I compiled this code using pdflatex (TeX Live 2010). If I try to copy
>>> the \ne-symbol in the corresponding pdf-file with the mouse cursor, I
>>> get an equality-sgin (=) instead. I only tried this with evince as
>>> pdf viewer, but I suspect that the behavior is similar for other
>>> viewers. I also tried to use something like
>>> \pdfglyphtounicode{notequal}{...}
>>> without success. I'm very grateful for any ideas.
>> A quick peek in plain.tex shows that, at least there, \ne is an
>> over-struck combination of two characters:
>>
>> 	\def\neq{\not=} \let\ne=\neq
>>
>> If LaTeX does the same thing, then there is no single "not equal" glyph.
> It depends on the used fonts and packages.
>
> If the font does not contain U+2260 (notequals), then
> at least the ActualText feature of the PDF format could be
> used (see PDF spec.):
>
> \documentclass{minimal}
> \pagestyle{empty}
> \usepackage{accsupp}
> \CheckCommand*{\ne}{\not=}
> \renewcommand*{\ne}{%
>    \BeginAccSupp{method=hex,unicode,ActualText=2260}%
>    \not=%
>    \EndAccSupp{}%
> }
> \begin{document}
> $\ne$
> \end{document}
>
> Yours sincerely
>    Heiko Oberdiek
> _______________________________________________
> TeX FAQ: http://www.tex.ac.uk/faq
> Mailing list archives: http://tug.org/pipermail/texhax/
> More links: http://tug.org/begin.html
>
> Automated subscription management: http://tug.org/mailman/listinfo/texhax
> Human mailing list managers: postmaster at tug.org

Thanks to all who replied. To answer Philip Taylor's question: I do not 
have a reasonable application for this copy procedure. I discovered 
these things while converting my PhD thesis to the PDF/A format in order 
to satisfy the library specifications. I found out that the commands

\pdfglyphtounicode{multicloseright}{22CA}
\pdfgentounicode=1

allow me to copy $\rtimes$ (from amssymb) to the corresponding unicode 
character. So, I wondered if this is also possible with $\ne$.

Heiko Oberdiek's approach works perfectly. I also want to point out that 
I do not actually need this for my thesis, since the PDF/A-1b format 
hopefully suffices (instead of the more restricted PDF/A-1a format)

Thank you again,
Benjamin


More information about the texhax mailing list