[texhax] extracting math from pdf file

Heiko Oberdiek heiko.oberdiek at googlemail.com
Mon Dec 6 00:56:03 CET 2010


On Sun, Dec 05, 2010 at 06:38:45PM -0400, Jim Diamond wrote:

> On Sun, Dec  5, 2010 at 22:46 (+0100), Benjamin Sambale wrote:
> 
> > \documentclass{minimal}
> > \begin{document}
> > $\ne$
> > \end{document}
> 
> > I compiled this code using pdflatex (TeX Live 2010). If I try to copy
> > the \ne-symbol in the corresponding pdf-file with the mouse cursor, I
> > get an equality-sgin (=) instead. I only tried this with evince as
> > pdf viewer, but I suspect that the behavior is similar for other
> > viewers. I also tried to use something like
> 
> > \pdfglyphtounicode{notequal}{...}
> 
> > without success. I'm very grateful for any ideas.
> 
> A quick peek in plain.tex shows that, at least there, \ne is an
> over-struck combination of two characters:
> 
> 	\def\neq{\not=} \let\ne=\neq
> 
> If LaTeX does the same thing, then there is no single "not equal" glyph.

It depends on the used fonts and packages.

If the font does not contain U+2260 (notequals), then
at least the ActualText feature of the PDF format could be
used (see PDF spec.):

\documentclass{minimal}
\pagestyle{empty}
\usepackage{accsupp}
\CheckCommand*{\ne}{\not=}
\renewcommand*{\ne}{%
  \BeginAccSupp{method=hex,unicode,ActualText=2260}%
  \not=%
  \EndAccSupp{}%
}
\begin{document}
$\ne$
\end{document}

Yours sincerely
  Heiko Oberdiek


More information about the texhax mailing list