Google’s Book Search Library Project Faces Copyright Challenges

I. Introduction

Google’s stated mission is “to organize the world’s information and make it universally accessible and useful.” [1] As part of that mission, the Google Book Search Library Project is scanning and organizing printed books from dozens of libraries. By digitizing these books and making them available online, the Google Book Project will potentially benefit academic research around the world by increasing accessibility to rare and remote volumes. Google plans to make its entire digital library searchable as part of its primary search engine, reaping profits from its current advertisement structure. However, the Project has drawn opposition from some publishers, librarians and academics for a variety of reasons, including threats to copyright, scan quality and search biases.

II. Background

In late 2004, Google announced an extension of its Google Print program, which assisted publishers in “making books and other offline information searchable online." [2] Through its Print program, Google softened the copyright ground by getting publishers use to the idea of making copyrighted material available online and searchable. On December 14, Google announced its partnership with five major library systems: Harvard, Stanford, the University of Michigan, the University of Oxford, and The New York Public Library. [3] These libraries opened their enormous stacks to Google, making their collections available for scanning. Google paid the full costs of the scanning process, and even agreed to pay for any potential litigation relating to the project.

Despite threats of legal action from various trade groups, several more libraries joined Google’s ranks in the following three years. In August 2006, the University of California system signed onto the Google Book Project. [4] On October 12, 2006, the University of Wisconsin–Madison became the 8th library to join, following Madrid's Complutense University, the largest university library in Spain. [5] In late January 2007, the University of Texas at Austin joined, granting access to several noted collections concentrating on Latin America. [6] In March 2007, the Bavarian State Library joined the project, opening access to “more than a million public-domain books,” including “out-of-copyright works in French, Spanish, Latin, Italian and English.” [7] Google continued adding foreign libraries during 2007, from The Boekentoren Library of Ghent University to Keio University in Japan. [8] Google also continued adding major institutional libraries throughout the US, including the Cornell University Library [9] and the Committee on Institutional Cooperation, a consortium of 12 research universities. [10] Google will undoubtedly seek additional partners, especially those libraries with rare or unique collections. However, some libraries are not so eager to jump on Google’s bandwagon.

Google faces competition from several entities who have initiated their own competing scan services. Microsoft is the biggest name among Google’s scanning-competitors, with equally deep pockets to finance the scanning process and combat litigation. [11] Although Microsoft’s project still trails Google in press coverage and alliances, it is beginning to build its own stable of alliances. In the fall of 2007, Yale University agreed to allow Microsoft to scan “thousands of books from its library system." [12] But the most notable competition comes from the Open Content Alliance (OCA), which fears the concentration of digital materials in the hands of a single company. [13]

The OCA currently boasts about 60 members, including Microsoft, Yahoo, and libraries, universities. [14] A recent addition was the Boston Library Consortium, a group of 19 research and academic libraries in New England, including the Massachusetts Institute of Technology, Brown University, the University of Connecticut and the University of Massachusetts. [15] Unlike Google, which requires authors and publishers to opt out of the program after scanning is complete, the OCA will not scan books without the permission of the author. [16] Google’s opt-out system is a major source of friction with the publishers and authors who filed suit against Google. OCA will also allow any search engine to catalog and search its database, unlike Google and Microsoft, who prevent its books from being indexed by competing search engines. [17]

It costs the OCA “as much as $30 to scan each book,” a factor that pushed many libraries and institutions into their alliances with Google and Microsoft. [18] Google currently scans approximately 3,000 books per day, roughly 1 million books annually. [19] Current cost estimates for Google’s project are roughly $100 million, with the full cost paid by Google. Without a similar corporate or charitable benefactor, the OCA faces significant hurdles in scanning as many books as quickly as Google can. However, Google faces several challenges in spite of its advantages in funding and technology.

III. Challenges

Because of the size and scope of Google’s project, it has become the focal point for various challenges to book digitization. In May 2005, the Association of American University Presses, a 125-member nonprofit organization of scholarly publishers, sent a six-page letter to Google detailing their concerns over the Book Project. [20] In September 2005, the Authors Guild filed suit against Google for “massive” copyright infringement. [21] The Authors Guild “represents more than 8,000 authors and is the largest society of published writers in the United States." [22] In October 2006, the French publishers union Le Syndicat National de l’Edition “joined book publisher Le Martiniere Groupe in its copyright suit against Google." [23]

In these lawsuits, publishers and authors raised various copyright concerns. First, copyright owners are unhappy with Google’s opt-out system for the Library Search. In contrast, there are no complaints regarding Google Print, a system where publishers opt-in, since participants consent to Google’s use. More significantly, Google may be violating the copyright owner’s right to make copies and control distribution, since Google makes a digital copy for itself while also providing the source libraries with their own copy. [24] The two sides disagree as to whether this constitutes fair use. [25] To address copyright concerns, Google’s search function only allows access to the full text if the volume is already out of copyright. For volumes still protected by copyright, Google’s search only displays a small selection of the full text, similar to an image thumbnail. Hence, Google relies on Kelly v. Arriba Soft Corporation, claiming that this setup constitutes fair use. Several lawsuits against Google are currently being litigated on these copyright issues.

IV. Analysis

Copyright owners claim that Google’s digitization has violated their right to reproduce and distribute. Google is vulnerable to this charge because it gives the originating library a copy of all books digitized from that library’s collection, in addition to storing digital copies on its own servers. Google believes that by offering only a snippet of copyrighted works and by barring access to the full text, it is protected as fair use. Indeed, most search engines use similar tactics when displaying search results for texts, images and news stories. Courts have repeatedly sided with the search engines, since a contrary ruling would effectively destroy the Internet by hamstringing a search engine’s ability to function.

Google already makes copies of the entire Internet on its servers, cataloging and indexing the information for its search engine. Few, if any, have challenged Google’s right to make these copies. Like publishers, webmasters desire the widest possible audience, and recognize Google’s utility in reaching that goal. However, webmasters willingly make their content available on the Internet, granting Google an “implied license” to index their content. [26] Publishers have not made their content available online in a similar manner. Unlike most webmasters, publishers fear that Google will siphon off sales rather than increase them. In that respect, the publishers are similar to newspapers and wire services, some of whom have brought suit against Google News.

Google News has drawn lawsuits from a variety of news agencies and wire services, primarily in Europe. [27] A Belgian court ruled against Google’s use of snippets and thumbnails. [28] But even as Google is mired in litigation, they are working on contracts with various news agencies to include their content in Google News. [29] Most news agencies recognize that exclusion from aggregation services such as Google’s will only harm them, since Google News will still index hundreds of other agencies eager and willing to have search traffic driven to their sites. Similarly, Google may be looking to contract with publishers, even as litigation continues.

V. Conclusion

The Google Library Project represents a potential boon to scholars and researchers around the world. A completed database will make accessible millions of rare and hard to find volumes in the remote great libraries of the world. Despite concerns that Google will effectively gain control of out-of-copyright public domain materials, the Book Project is more likely to enhance access to these materials. Public domain materials that were easily accessed before the Book Project will likely continue to be easily accessible through the same sources. The windfall comes from previously inaccessible public domain materials, which will become widely available after scanning and indexing. The Book Project will also improve access to out of print editions from publishers, potentially generating sales revenue for publishers.

While litigation mounts in opposition to Google’s digitization project, it likely will not have a long-term impact. The successful digitization and cataloging of the world’s libraries would no doubt increase traffic to Google’s search engine. The lack of digitized books in Google’s index would only be financially damaging if a competing search engine gained access to those works through exclusive contracts with copyright owners.

[1] Google Corporate Information: Company Overview, http://www.google.com/corporate/

[2] Press Release, Google, Google Checks Out Library Books (Dec. 14, 2004), http://www.google.com/press/pressrel/print_library.html

[3] Id.

[4] University of California System Joins Google Book Project, Aug. 9, 2006, http://www.iht.com/articles/2006/08/09/business/google.php

[5] Press Release, University of Wisconsin-Madison, UW-Madison Joins Google’s Worldwide Book Digitization Project (Oct. 12, 2006), http://www.news.wisc.edu/releases/13010.html

[6] Google to Digitize More Than A Million Books From the University of Texas at Austin, Including World Renowned Latin American Collection, Jan. 20, 2007, http://press-releases.techwhack.com/6897/google-to-digitize-more-than-a-million-books/

[7] Elinor Mills, Bavarian Library Joins Google Book Search Project, Mar. 6, 2007, http://www.news.com/8301-10784_3-6164875-7.html

[8] Press Release, Ghent University Library, Google and Ghent University Library To Make Hundreds of Thousands of Dutch and French Books Available Online (May 23, 2007), http://lib1.ugent.be/cmsites/Default.aspx?alias=BO_wwwboekentorenbe_google_pressrelease

[9] Press Release, Cornell University Library, Cornell University Library Becomes Newest Partner in Google Book Search Library Project (Aug. 7, 2007), http://library.cornell.edu/communications/Google/

[10] Press Release, Committee on Institutional Cooperation, Partnership Announced Between CIC Libraries and Google (June 6, 2007), http://www.cic.uiuc.edu/programs/CenterForLibraryInitiatives/Archive/PressRelease/LibraryDigitization/index.shtml

[11] Google’s Book Scanning Faces Competition, Nov. 12, 2007. http://www.eschoolnews.com/news/top-news/index.cfm?i=50345&CFID=88625&CFTOKEN=37645010

[12] Id.

[13] Id.

[14] Id.

[15] Katie Hafner, Libraries Shun Deals to Place Books on Web, Oct. 22, 2007, http://www.nytimes.com/2007/10/22/technology/22library.html?_r=1&em&ex=1193544000&en=ce927953c53a4745&ei=5087%0A&oref=slogin

[16] Google’s Book Scanning Faces Competition, supra note 11.

[17] Id.

[18] Hafner, supra note 15.

[19] Google’s Book Scanning Faces Competition, supra note 11.

[20] Stefanie Olsen, Publishers Balk at Google Book Copy Plan, May 24, 2005, http://www.news.com/Publishers-balk-at-Google-book-copy-plan/2100-1025_3-5719156.html

[21] Elinor Mills, Authors Guild Sues Google Over Library Project, Sept. 20, 2005, http://www.news.com/Authors-Guild-sues-Google-over-library-project/2100-1030_3-5875384.html

[22] Id.

[23] Candace Lombardi, Google Sued By French Publishers, Oct. 31, 2006, http://www.news.com/8301-10784_3-6131056-7.html

[24] Au Courant, http://paulcourant.net/ (Nov. 6, 2007).

[25] Id.

[26] Nate Anderson, Judge: Google Cache Kosher When It Comes to Copyright, Jan. 26, 2006, http://arstechnica.com/news.ars/post/20060126-6063.html

[27] Thomas Wilburn, AP Sues US News Aggregator for Copyright Infringement and Trademark Abuse, Oct. 10, 2007, http://arstechnica.com/news.ars/post/20071010-associated-press-sues-news-aggregator-for-licensing-failure.html

[28] Ken Fisher, Google Defeated in Belgian Copyright Case; Everyone But Google Loses, Feb. 13, 2007, http://arstechnica.com/news.ars/post/20070213-8831.html

[29] Nate Anderson, Google Says “No Secret Deals” With UK News Organizations, May 21, 2007, http://arstechnica.com/news.ars/post/20070521-google-says-no-secret-deals-with-uk-news-organizations.html