OT: Creating PDF with both scanned images of text and also raw text

Peter Flynn peter at silmaril.ie
Thu Jul 4 20:02:00 CEST 2019


Oh right, I see, I misunderstood.

P

On 4 July 2019 12:23:08 Aaron Gray <aaronngray.lists at gmail.com> wrote:
> On Thu, 4 Jul 2019 at 00:28, Peter Flynn <peter at silmaril.ie> wrote:
> On 03/07/2019 22:09, Aaron Gray wrote:
>> I  am scanning old papers in both image and OCR'ed form and I want to
>> be able to combine them in a PDF document so the images are visible
>> but the text also is in the PDF for anyone who wants to extract it.
>>
>> I have found camera ready PDF's that have text in them and been able
>> to extract both so I want to be able to do the same.
>
> The pdfimages utility will extract the images separately to PNM files,
> which you can convert to JPEG with ImageMagick or similar.
>
> What are you using for the OCR? I have had excellent restults withTesseract.
>
> Sorry no I am after creating PDF's with image based content and hidden text 
> that it retrievable with PDF text extraction tools.
>
> Thanks,
>
> Aaron
>
> --
>
> Aaron Gray
>
> Independent Open Source Software Engineer, Computer Language Researcher, 
> Information Theorist, and amateur computer scientist.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20190704/e10847ff/attachment.html>


More information about the texhax mailing list