Mary Shelley, rakt in i fånen!

| Kommentarer ()
lastman.jpg
Det här är så stiligt! 1.5 miljoner böcker på allmänningen, rakt in i iPhonen (eller Androiden). Det är så många böcker man nu når via Google Books mobila version.

Inside Google Book Search skriver om utmaningen med teckenigenkänningen:

The extraction of text from page images is a difficult engineering task. Smudges on the physical books' pages, fancy fonts, old fonts, torn pages, etc. can all lead to errors in the extracted text. (...)

Imperfect OCR is only the first challenge in the ultimate goal of moving from collections of page images to extracted-text based books. Our computer algorithms also have to automatically determine the structure of the book (what are the headers and footers, where images are placed, whether text is verse or prose, and so forth). Getting this right allows us to render the book in a way that follows the format of the original book.

The technical challenges are daunting, but we'll continue to make enhancements to our OCR and book structure extraction technologies. With this launch, we believe that we've taken an important step toward more universal access to books.

kommentar(er)

Senaste kommentaren

www.flickr.com


Jag heter Erik Stattin och det här är min blogg. Jag skriver om digital kultur, ungefär. Du får gärna tipsa mig om saker. Kontakta mig på erik.stattin@gmail.com. Jag är mymarkupTwitter och Delicious.


Twitter Updates