elf: First page of legal document with OCR in process (Doc conversion)
elf ([personal profile] elf) wrote in [community profile] ebooks 2012-10-23 06:27 pm (UTC)

I'd be willing to do OCR for a couple of books, if anyone has one or two they really really want the searchable text from.

Could either convert to searchable PDF or Word (or both); converting to other ebook formats would take more substantial time & effort. (And is possible, but would need to be negotiated around my schedule.) It looks like they're nice high-res scans (600dpi) and would OCR easily.

Searchable PDF (hidden, corrected text under the scanned image, copy/pastable but likely with some formatting problems) is easy and, in small batches, fun for me. Word docs with most of the formatting removed (like, getting rid of the columns; putting all the footnotes at the end; skipping the index) is likewise easy and fun. Trying to match the original formatting is less fun and takes more time; I'm available for that but not willing to offer it to the general public for free.

Whole books take time. A single chapter probably takes an hour or so (depending on how much text is involved); a single page is a matter of minutes. (Erm. Not counting download time. I don't have the mega-fast DSL.) I'm happy to help people out with conversion if they want/need access to the text.

Post a comment in response:

If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org