View Single Post
Old 11-03-2023, 10:44 AM   #4
retiredbiker
Addict
retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.retiredbiker ought to be getting tired of karma fortunes by now.
 
retiredbiker's Avatar
 
Posts: 389
Karma: 1638210
Join Date: May 2013
Location: Ontario, Canada
Device: Kindle KB, Oasis, Pop_Os!, Jutoh, Kobo Forma
Quote:
Originally Posted by Slash View Post
the tools I'm using with ocr recognition is for pdf only, I didn't find the way to make an ocr from my original jpeg files in Calibre
If you are using Linux, a cool way to do OCR on jpeg images is OCRFeeder as a front end for Tesseract. It gives you fine control, handles paragraphing and end-of-line hyphens very well. Lets you do double-column and other ugly things.

If you have a pdf with OCR text in it, Calibre will use the pdftohtml tool to extract the text. Sometimes this does not work, for some reason, so try using the pdftotext tool outside Calibre. That will give you a text file, but you are on your own for paragraphing and formatting...as always with pdf.

Anything OCR'd needs proofing and editing, an that is usually the hardest part of the project.
retiredbiker is offline   Reply With Quote