The Google Books project has been a working progress ever since Google was created. The co-founders of Google, Sergey Brin and Larry Page had been working on a research project that was supported by the Stanford digital Library Technologies Project in 1996. Google intends to scan every book ever published and make all of the text searchable so that people can find the relevant information they need about book. They want to make books more accessible to the public and create an easy mechanism of sorting a book’s content and relevance to a subject.
In 2002 a secret “books” project was launched and research was underway to identify the challenges that lay ahead of them. Over this period, Googlers discovered a quick and harm free way to scan books and began to meet with Libraries to begin the digitalization of books. In December 2004 Google announces the launch of the “Google Print” Library Project thanks to partnerships from Harvard, The University of Michigan, The New York Public Library, Oxford and Stanford. Together it is said that these libraries exceed 15million volumes. In 2005 Google Print is renamed Google Books which is a more fitting title as it better explains it’s use.
With the launch of Google Books and its fast development many will argue of the advantages and disadvantages of the site. The whole project seems a little bit overly ambitious and it obviously has many flaws in its system. It is a timely process to scan hundreds of millions of books and the pivotal question here is “Are Google books doing it right? ” Scanning books is an extremely time consuming process so once Google books have done it, it seems unlikely that the books will be rescanned. If some of the books are not scanned properly, important literature and information could become obscured or lost through the process of digitalization.
Geoff Nunberg (2009) published an article Google books: A Metadata Train Wreck and pointed out many errors in the system. One example being that he googled the name of an author and restricted the search to the works published before their year of birth. It was found that 182 hits came up for Charles Dickens alone. The Chief Engineer for Google Books, Dan Clancy claimed that the incorrect dates where the fault of the libraries. However, when the matter was investigated further it shows that the first ten full read books published before 1812 and that mention Charles
Dickens are correctly dated in the catalogues that they had come from. Although one can argue that the correct information is given on the title page, there have been some other inexcusable errors too. Google Books has classified many of its books incorrectly and once again Dan Clancy has claimed that both the libraries and publishers where to blame because the classifications were drawn from the BISAC codes that is given to booksellers. BISAC codes have only been around for about 20 years meaning that any book that was put in the wrong category before this time is a mistake of Google themselves.
Google have decided to take on an extremely large project but it seems apparent that they are not doing it very well. They are quick to push the blame on others and the whole project is based more towards commercialism rather than to help make knowledge available to the world. Project Gutenberg was one of the first “digital” libraries and was created by volunteers. This project seems to focus more on the importance of literature and the quality of the books available are much greater than those on Google Books.
The books are proof read by human beings and their workers are not paid which is a clear sign that they actually care about making books more available to people. Google Books produces books in a much larger mass but they should be aware that people will value “quality over quantity” most. Google quickly scan these books and it’s obvious that they rarely check them for errors. In Paul Duguid’s (2007) essay Inheritance and Loss? A Brief Survey of Google Books, He addresses the Google books system hands on by using Laurence Sterne’s The Life and Opinion of Tristram Shandy as an example.
He choose the first link that appeared in the search engine and claims his results were as follows; The book he was examining did not start with the word “wish” meaning that the left hand side of the page that had the word “I” was missing. On page seventeen the left hand side of the page is not legible because the gutter of the book is blocking the first few letters and by page twenty-seven, Sterne quoted Hamlet’s phrase “alas, poor Yorick! ” and inserted a black page of mourning. However the version that is on Google books has left out this page and is somewhat ignorant to the fact of how iconic it is to the astute reader.
On further investigation of Duguid’s essay I clicked on the links that were given to the book and realised that it was no longer a link to the book. I then searched Tristram Shandy just had Duguid had done into Google Books. I clicked on the first link which is the same Harvard edition that Duguid was referencing and discovered that the first page had the word “I” before “wish” and page seventeen was now fully legible. Although some corrections had been made the black page that was to follow on from page twenty seven has still not been inserted.
This is perhaps due to the fact that the people scanning these books are not scholars themselves. It is very easy to recognise a page with a missing word or one that is not fully legible but many would mistake a black page as an error in printing. Another flaw in the digitalization in books is the actual book itself. There is something so pleasant about flicking through a book and holding it in your hand while you read. The book in its own physically is magnificent, depending on how old it is it could have been passed on from generation to generation.
The book itself is a story in its own right. Throughout its lifespan the book can acquire various annotations, signatures and other interesting characteristics. There has also been a lot of conflict with regard to the publishing industry and the digitalization of books. Google has offered to provide a search engine what they aspire to be every book ever published but for those which are copyrighted and cannot be viewed online, Google provides the option to purchase them online through sites such as Amazon or Barnes.
In January 2007, Google held a conference on the future of the publishing industry. The conference quoted Charles Darwin and projected it on a screen: “It is not the strongest of the species that survive, nor is it the most intelligent, but the ones most responsive to change. ” Toobin (2007) states in an article Google’s Moon Shot: “As Laurence Kirschbaum, a long time publishing executive who recently became a literary agent, told me at the conference, “Google is now the gatekeeper. They are reaching an audience that we as publishers and authors are not reaching.
It makes perfect sense to use the specificity of a search engine as a tool for selling books. ”” This statement has a lot of truth because since the growth of technology, the popularity of books has fallen drastically. People in the 21st century care more for mindless television shows and tacky magazines than a good well written piece. Reading books challenges the mind and fuels the imagination and by incorporating literature with technology it is a great attempt to try and revive such an excellent thing. Despite Google’s attempts, it looks as if they are not doing a good job.
Many authors and publishers filed a lawsuit against Google Books claiming that Google has violated their copyrights by scanning the books, creating an electronic database and displaying short excerpts without their permission. The Authors Guild filed a lawsuit against Google Books alleging copyright infringement and after four years of discussion a settlement was finally reached in 2009. It was decided that Google was allowed to copy, display and sell millions of books that were out of print but still in copyright. However the agreement was reviewed several times and was summarily rejected in March 2011.
EPIC states that readers will be required to part with particular information that will be stored in a database to create detailed profiles of preferences of the reading with regard to their purchases and browsing. Marc Rotenberg appeared in court on February 18th 2010 and stated that: “A person at any library or any university in the United States that attempted to retrieve information from Google’s digital library would be uniquely tagged and tracked. There is simply no precedent for the creation of such power”.
Google Books seem to have rushed the whole process of scanning such a vast amount of literature and by doing so they seemed to have forgotten about “quality over quantity. ” It seems that the dream of creating a digital library will remain one for the foreseeable future due to the numerous flaws that the system has. Whilst Google Books are trying to correct their many errors it is apparent that the whole project was done quite carelessly and insufficiently. It is evident that Google Books motive leans more to the commercial side of things rather than making knowledge available to a wider audience.
2009. Web. 11 Nov. 2012. http://languagelog. ldc. upenn. edu/nll/? p=1701 •McSherry, Corynne. “Good and Bad in Google Book Search Settlement Decision | Electronic Frontier Foundation. ” Good and Bad in Google Book Search Settlement Decision | Electronic Frontier Foundation. N. p. , 23 Mar. 2011. Web. 11 Nov. 2012. . •Rogers, T. “Google Books: Good for Knowledge, Bad for Privacy. ” Information Privacy Law. N. p. , 28 Mar. 2011. Web. 12 Nov. 2012. http://www. brianrowe. org/infoprivacylaw/2011/03/28/google-books-good-for-knowledge-bad-for-privacy/ •”Google Books. ” Google Books. N. p. , n. d. Web. 11 Nov. 2012. .