beatrice_otter: Emma and Henry reading the book of fairy tales (Once Upon a Time)
beatrice_otter ([personal profile] beatrice_otter) wrote in [community profile] ebooks2012-10-22 11:25 pm

Free non-DRM ebooks: The Metropolitan Museum of Art back catalogue

The Metropolitan Museum of Art has put PDFs of all their out-of-print works available for free online.  Bad news is, they're the kind of PDFs that you can't copy the text out of, so if you want to convert to another format you'll have to OCR them first.  Good news is, there's a lot of really cool books there for free.
pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)

[personal profile] pauamma 2012-10-23 01:20 pm (UTC)(link)
I blame you (and afuna) for the lost productivity and the need for a new keyboard to replace the one drooling shorted out.

Er, I mean, thanks. (Keys next to each other, doncha know?)
elf: First page of legal document with OCR in process (Doc conversion)

[personal profile] elf 2012-10-23 06:27 pm (UTC)(link)
I'd be willing to do OCR for a couple of books, if anyone has one or two they really really want the searchable text from.

Could either convert to searchable PDF or Word (or both); converting to other ebook formats would take more substantial time & effort. (And is possible, but would need to be negotiated around my schedule.) It looks like they're nice high-res scans (600dpi) and would OCR easily.

Searchable PDF (hidden, corrected text under the scanned image, copy/pastable but likely with some formatting problems) is easy and, in small batches, fun for me. Word docs with most of the formatting removed (like, getting rid of the columns; putting all the footnotes at the end; skipping the index) is likewise easy and fun. Trying to match the original formatting is less fun and takes more time; I'm available for that but not willing to offer it to the general public for free.

Whole books take time. A single chapter probably takes an hour or so (depending on how much text is involved); a single page is a matter of minutes. (Erm. Not counting download time. I don't have the mega-fast DSL.) I'm happy to help people out with conversion if they want/need access to the text.
pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)

[personal profile] pauamma 2012-10-23 08:21 pm (UTC)(link)
I have some OCR software installed (OCRFeeder, Gocr, Tesseract, and Cuneiform), but never actually got around to try out any of it. If you (or anyone) is willing to provide advice or help as needed, I may be able to contribute some, and definitely willing.
nagasvoice: lj default (Default)

[personal profile] nagasvoice 2012-10-24 09:03 am (UTC)(link)
Thank you for the link, have reposted it.
Edited 2012-10-24 09:03 (UTC)