Welcome | Sign In
ECommerceTimes.com
Future Tech

Antispam Word Jumbles to Help Digitize Books

Print Version
E-Mail Article
Reprints
Antispam Word Jumbles to Help Digitize Books

A Carnegie Mellon University project is using CAPTCHA -- or Completely Automated Public Turing Test to Tell Computers and Humans Apart -- tests to digitize books. Three hundred Web sites have already signed up to use the technology. About 60 million CAPTCHA tests are solved every day.


Success is just a matter of knowing the right "secrets." Download the free eBook, "The Edge of Success: 9 Building Blocks to Double Your Sales." You will discover the fastest, most effective ways to grow your business and still have time to live your life.

Web surfers all too familiar with the distorted-letter tests that accompany so many site registration forms today can now take heart -- the time they spend on those tests is being put to good use.

Thanks to a project at Carnegie Mellon University, a new version of those pesky CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart) tests makes the technology work double-duty: Not only does it continue to distinguish between legitimate human users and malevolent spam programs, it also uses the results to aid in the digitization of books for the Internet Archive.

A Carnegie Mellon team led by Luis von Ahn, an assistant professor of computer science and recipient of a MacArthur Foundation genius grant, developed the new tests, dubbed "reCAPTCHAs," which were launched on Wednesday.

Helping OCR

Optical character recognition (OCR) technology used to digitize printed text is often confounded by underlined text, scribbles and fuzzy or otherwise poorly printed letters.

ReCAPTCHA tests work by asking users to type in one distorted but known word along with one that has stumped an optical character recognition (OCR) system working on a digitization project. If the user inputs the known word correctly, then the system has greater confidence that he or she has deciphered the problematic word correctly too.

Each unknown word is submitted to multiple users; if several enter the same translation, the system assumes it is correct.

In this way, the new tests continue to distinguish between humans and machines because they use text that OCR systems have already failed to read. However, they also contribute to book digitization projects by helping OCR systems convert printed text into computer-readable letters.

Wasting 150,000 Hours a Day

Von Ahn worked on the original CAPTCHA technology for Yahoo (Nasdaq: YHOO), and was astounded to later learn that 60 million of the tests are solved every day by people around the world. "When I first found this out, I was quite proud of myself and the impact my research has had," von Ahn told TechNewsWorld.

"But then I started feeling bad: Each time a CAPTCHA is solved, 10 seconds of human time are basically wasted," von Ahn explained. "If you multiply that by 60 million, you get that humanity as a whole wastes about 150,000 hours every day solving CAPTCHAs. That's a lot of time!"

Inspired to come up with additional ways the technology could do something useful for humanity, von Ahn then had the idea of helping to digitize books.

By Thursday night, about 300 Web sites had signed up to use the technology and 20,000 words had been digitized, von Ahn said. One of the first books being tackled is John Dewey's Psychology, he added.

Strength in Numbers

By tapping into the collective power of thousands of computer users worldwide, reCAPTCHA technology is similar to the distributed computing SETI@home project, through which users donate their computers' spare processing time to help process the enormous volumes of radio signals from space that get recorded by radio telescopes around the globe.

With support from Intel (Nasdaq: INTC), von Ahn's team has developed a free, Web-based service that allows individual webmasters to install reCAPTCHAs to protect their sites. Individuals can also use the technology to protect their own e-mail Increase Customer Sales with Email Marketing -- Free Trial from VerticalResponse addresses.

'The Spirit of Web 2.0'

"ReCAPTCHA is a brilliant idea and implementation," Jason Dowdell, operator of media and technology blog MarketingShift, told TechNewsWorld.

"Far too many entrepreneurs have built applications that solve only one problem," Dowdell added. "Von Ahn has built a platform that is incredibly simple at its core yet provides the opportunity to meet some very large challenges -- that's the spirit of Web 2.0."


Print Version E-Mail Article Reprints More by Katherine Noyes


More by Katherine Noyes

Does Wine Make Linux Too Loose?
November 05, 2009
For those Wine aficionados out there, beware of the remote possibility that your Linux system could be infected by Windows-seeking malware. "WINE running a Windows virus is nothing more than a 'stupid Linux trick' ... for now," said Slashdot blogger hairyfeet. But if the year of the Linux desktop ever arrives, he wonders, can Linux hold up to a "tidal wave of stupidity"?
PayPal Gets Friendly With Developers
November 04, 2009
PayPal is aiming to remove some of the obstacles to wider use of its service by giving developers the tools they need to embed its functionality directly in applications. That means a user could make a purchase without leaving a mobile game, for example. "The network is the platform on which the potential of digital money will be fully realized," said PayPal President Scott Thompson.
Firefox 3.6 Tweaks Are Mostly Under the Hood
November 03, 2009
For users, Mozilla's new Firefox 3.6 beta includes personas -- a new feature for changing Firefox skins -- and it sends alerts when it encounters out-of-date plug-ins. Developers may be more interested in some of the more subtle changes, however -- e.g., support for new CSS, DOM and HTML5 Web technologies, as well as support for image rendering and multiple background images.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network