Yahoo has put its money behind a team effort to bulk up the amount of valuable content available on the Web. The search engine company will foster the Open Content Alliance, a consortium set up to digitize materials in the public domain, such as classic books, and those published under the less-restrictive Creative Commons license.
Yahoo also said it would ask authors with copyrighted materials whether they wanted their works to be part of the project. That differs from Google’s efforts, already underway, in which brief descriptions of copyrighted books can come up in search results. The Author’s Guild last week sued Google over the issue, claiming copyright infringement.
UC Library Is First
The alliance’s first project will be the digitization of approximately 18,000 fiction and non-fiction titles that aren’t copyright protected, all contained in libraries within the University of California system. That would include books published before 1923, as their copyrights have expired.
The group comprises Adobe Systems, Hewlett-Packard Labs, Internet Archive, the National Archives of the UK, O’Reilly Media, the Prelinger Archives, the University of California and the University of Toronto.
The ultimate goal of the project, Yahoo said, is to make the materials available, searchable and downloadable for free over the Web.
Joseph Janes, associate dean for academics at the Information School of the University of Washington, said the idea was fascinating.
“It’s a substantial expansion of access to the human record,” he said. “The real benefit is for the works that have been forgotten or lost. There’s a lot of stuff that sits in print before that date that is not Jane Eyre or Moby Dick but nevertheless has value.”
Too Much Information
Much of how the search engine will work is unknown. Its method of use will depend on the way it looks and whether the materials will be available as a PDF or another commonly used file format. Some people may read an entire text online, while others might prefer to print it out. Others, still, may read a snippet online and then go to the library to check out the hard copy.
But, Janes points out, the ever-growing availability of content can cause problems as well as open doors.
“As I said, this is substantially increased access, and substantially increased access is a double-edged sword,” he said. “It’s much easier to find lots of stuff, but then you have too much stuff. If you’re looking for something specific [and can hone your search properly], those technologies are very, very good; if you’re looking for something in general, they’re not that good.”
He used the example of one search for “platypus” and another for “fish.” It would be easier to sort through the smaller number of results for “platypus” than it would be to find the books you need that contain the word “fish.”