pdftotext removing "fi" from a recent pdf I made with latex,

Mike Marchywka marchywka at hotmail.com
Sun Nov 24 14:17:49 CET 2019


On Sun, Nov 24, 2019 at 02:50:59AM -0800, Tomas Rokicki wrote:
>    I believe this is largely a poppler problem.  I'd be happy to discuss it a bit more if you would like.
>    -tom

I did not look at the pdf but this is what pdftotext created, 
cat schumann.pdf | pdftotext - - | myod -
000000 74 65 73 74 20 61 20 77 6f 72 64 20 74 68 61 74  >test a word that<
000010 20 64 65 1c 6e 65 73 20 74 68 65 20 70 72 6f 62  > de.nes the prob<
000020 6c 65 6d 2c 20 64 20 65 20 66 20 69 20 6e 20 65  >lem, d e f i n e<
000030 20 73 0a 0a 31 0a 0a 0c                          > s..1...<
000038


The dvi seems to contain this ( mydviasm.txt is a modified script, 
dont ask about the extenstion... ), 
mydviasm.txt schumann.dvi | more


8b0         set: 'w'
8b1         right: -0.277786pt
8b4         set: 'ord'
8b7         w0:
8b8         set: 'that'
8bc         w0:
8bd         set: 'de\x0cnes'
8c3         w0:
8c4         set: 'the'
8c7         w0:
8c8         set: 'problem,'





> 
>    On Sun, Nov 24, 2019 at 2:47 AM Mike Marchywka <[mailto:marchywka at hotmail.com]marchywka at hotmail.com> wrote:
> 
>      On Sun, Nov 24, 2019 at 12:11:07AM +0000, Mike Marchywka wrote:
>      >
>      > I have never seen this before but looks like a stupid font problem
>      > but it likely to be common with many pdf's now. If I just run
>      > "pdftotext" on my output, I get weird boxes where each "fi"
>      > is. If I used "-enc ASCII7" the entire thing is deleted.
>      >
>      > I could probably create a minimal working example but thought someone
>      > may know offhand. Thanks.
>      Nevermind, I figured it out :) I added this stupid thing
>      \usepackage[T1]{fontenc}
>       to fix another problem although if you are finding pdftotext output
>      is jumbled or want to use the pdf ( and maybe dvi )  format
>      to obscure information that would be in a normal text file ,
>      this seems to work,
>       \documentclass{article}
>      \usepackage[T1]{fontenc}
>       \usepackage{hyperref}
>        \hypersetup{
>         pdfinfo={
>           x-bib-author  = {A. Writer},
>            x-bib-journal = {Test}
>              x-bib-buy-url = {[https://buyexpensivejunk/]https://buyexpensivejunk}
>          }
>       }
>      \newcommand{\addbib}[2]
>      {
>        \hypersetup{
>         pdfinfo={ x-bib-#1  = {#2} } }
>      }
>      \addbib{author}{marchywka}
>      \addbib{title}{my title}
>      \addbib{omething}{foobar abstratct asdfasdfa }
>      \begin{document}
>      test
>      a word that defines the problem, d e f i n e s
>      \end{document}
>      Compiling to pdf and inverting gives this,
>      cat schumann.pdf | pdftotext - -
>      test a word that de nes the problem, d e f i n e s
>      1
>      >
>      > This is the version,
>      >
>      > pdftotext -v
>      > pdftotext version 0.41.0
>      > Copyright 2005-2016 The Poppler Developers - [http://poppler.freedesktop.org/]http://poppler.freedesktop.org
>      > Copyright 1996-2011 Glyph & Cog, LLC
>      >
>      > and basic info on the pdf file,
>      > exifutil -list vitaprop.pdfExifTool Version Number         : 11.75
>      > File Name                       : vitaprop.pdf
>      > Directory                       : .
>      > File Size                       : 287 kB
>      > File Modification Date/Time     : 2019:11:23 06:17:53-05:00
>      > File Access Date/Time           : 2019:11:23 06:17:53-05:00
>      > File Inode Change Date/Time     : 2019:11:23 06:17:53-05:00
>      > File Permissions                : rw-rw-r--
>      > File Type                       : PDF
>      > File Type Extension             : pdf
>      > MIME Type                       : application/pdf
>      > PDF Version                     : 1.5
>      > Linearized                      : No
>      > Page Count                      : 12
>      > Page Mode                       : UseOutlines
>      > Author                          :
>      > Title                           :
>      > Subject                         :
>      > Creator                         : LaTeX with hyperref package
>      > Producer                        : pdfTeX-1.40.16
>      > Create Date                     : 2019:11:23 06:17:52-05:00
>      > Modify Date                     : 2019:11:23 06:17:52-05:00
>      > Trapped                         : False
>      > PTEX Fullbanner                 : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea
>      version 6.2.1
>      >
>      >
>      > --
>      >
>      > mike marchywka
>      > 306 charles cox
>      > canton GA 30115
>      > USA, Earth
>      > [mailto:marchywka at hotmail.com]marchywka at hotmail.com
>      > 404-788-1216
>      > ORCID: 0000-0001-9237-455X
>      >
>      --
>      mike marchywka
>      306 charles cox
>      canton GA 30115
>      USA, Earth
>      [mailto:marchywka at hotmail.com]marchywka at hotmail.com
>      404-788-1216
>      ORCID: 0000-0001-9237-455X
> 
>    --
> 
>    --  [http://cube20.org/]http://cube20.org/  --  [http://golly.sf.net/]http://golly.sf.net/  --

-- 

mike marchywka
306 charles cox
canton GA 30115
USA, Earth 
marchywka at hotmail.com
404-788-1216
ORCID: 0000-0001-9237-455X



More information about the texhax mailing list