OT: Creating PDF with both scanned images of text and also raw text

Aaron Gray aaronngray.lists at gmail.com
Fri Jul 5 16:20:09 CEST 2019


No please reread my original question :)


On Thu, 4 Jul 2019 at 19:02, Peter Flynn <peter at silmaril.ie> wrote:

> Oh right, I see, I misunderstood.
>
> P
>
> On 4 July 2019 12:23:08 Aaron Gray <aaronngray.lists at gmail.com> wrote:
>
>> On Thu, 4 Jul 2019 at 00:28, Peter Flynn <peter at silmaril.ie> wrote:
>>
>>> On 03/07/2019 22:09, Aaron Gray wrote:
>>> > I  am scanning old papers in both image and OCR'ed form and I want to
>>> > be able to combine them in a PDF document so the images are visible
>>> > but the text also is in the PDF for anyone who wants to extract it.
>>> >
>>> > I have found camera ready PDF's that have text in them and been able
>>> > to extract both so I want to be able to do the same.
>>>
>>> The pdfimages utility will extract the images separately to PNM files,
>>> which you can convert to JPEG with ImageMagick or similar.
>>>
>>> What are you using for the OCR? I have had excellent restults with
>>
>> Tesseract.
>>>
>>
>> Sorry no I am after creating PDF's with image based content and hidden
>> text that it retrievable with PDF text extraction tools.
>>
>> Thanks,
>>
>> Aaron
>>
>> --
>> Aaron Gray
>>
>> Independent Open Source Software Engineer, Computer Language Researcher,
>> Information Theorist, and amateur computer scientist.
>>
>
>

-- 
Aaron Gray

Independent Open Source Software Engineer, Computer Language Researcher,
Information Theorist, and amateur computer scientist.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/texhax/attachments/20190705/e6be0eec/attachment-0001.html>


More information about the texhax mailing list