Google Captures reCAPTCHA to Boost Book Project

The headline on the Official Google Blog makes sure you know what a CAPTCHA is; “Google Acquires reCAPTCHA” is written in the now-familiar wavy, squiggly-style font that you are used to seeing when you want to log on to certain Web sites or post links on Facebook.

When confronted with a CAPTCHA, you’re asked to read the odd-looking letters and type them into a text box, something that automated systems have trouble doing. The idea is to make sure a real person is making the request and not some spambot. However, the idea for Google when it announced Wednesday that it was buying reCAPTCHA, the company that provides the Carnegie Mellon-developed technology to some 100,000 Web sites, was to enhance its digital book scanning project, which has divided the publishing community.

“We’ll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process,” wrote reCAPTCHA cofounder Luis van Ahn and Google product manager Will Cathcart on the Official Google Blog. “Improving the availability and accessibility of all the information on the Internet is really important to us, so we’re looking forward to advancing this technology with the reCAPTCHA team.”

Impact on Book Scanning

When you input a word asked of you by CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) technology, you’re helping computers fill in the blanks in books and other text they’ve been asked to digitized, but can’t complete because of age, damage or other scanning issues. By bringing reCAPTCHA in-house, Google gets that much more assistance in completing its book-scanning ambitions.

“Taking advantage of reCAPTCHA technology will give Google an edge in continuing and accelerating its book-scanning project,” Forrester Research analyst Sarah Rotman Epps told the E-Commerce Times. “Google is very serious about this project. They are looking at an industry of print book sales — worth (US)$25 billion in the U.S. alone — that is just beginning a process of digitization. As with other forms of media, when books are digitized, value is destroyed. The $25 billion U.S. book industry won’t be worth $25 billlion in five years, but new value will be created in the process.”

The benefit to Google when that is done lies within its advertising technologies. “There are few other media that currently contain no ads — books are blue ocean for advertising,” Epps said.

The Controversy Over Scanning

Google’s book-scanning aims have been dogged by complaints and even lawsuits since their beginnings, mostly focusing on copyright and royalty issues for authors and publishers. Then there are those who simply believe, from an aesthetic/traditional point of view, that books belong on bookshelves and in readers’ hands, not within computer hard drives.

The company recently announced a settlement with publishers and authors; that agreement is being looked at by the Justice Department while also facing a legal challenge from a group of authors and publishing firms who have joined with the Electronic Frontier Foundation and the American Civil Liberties Union. That challenge focuses on data retention and privacy issues.

Meanwhile, “many publishers and authors actually do support the Google settlement, because it benefits them to have just one company managing all the disparate rights issues, like an ASCAP for the publishing industry,” Epps said.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

E-Commerce Times Channels