February 9, 2006  ·  Lessig

Some smart folks at Google have set up a group on Google Groups to do fact checking in the Google Book Search debate. Sign up and get your regular feed about the lies mistakes feeding this debate. Here’s a snippet from the first post:

In December, novelist Susan Cheever, a member of the Authors Guild, published “Just Google ‘thou shalt not steal,’” an article suggesting that there’s some kind of official word limit, or percentage limit, to material you can copy in order for it to qualify as fair use. She writes:

“The Copyright Statute…includes a ‘fair use’ clause, so that a few lines or phrases of a writer’s work can be used as illustration by someone else. …The amount of words that constitute fair use varies according to court case. At present, it is 400 words. …Google cites ‘fair use,’ but it isn’t using 400 words; it plans to digitize whole libraries and make them available piece by piece.” (Emphasis added.)

Even this small quotation from Cheever’s article fundamentally misstates copyright law and misleads readers about Google Book Search..

First, no such 400-word rule exists. Indeed, in some cases courts have ruled that copying and republishing the entire work is fair use. (You can read about one such court decision here.)

Second, Google does not show more than two or three sentences without the author’s permission. And that’s not all. If a copyright holder chooses not to participate in Google Book Search, not a single word from the book will appear in any searches.

  • anon

    “The distinction between “fair use” and infringement may be unclear and not easily defined. There is no specific number of words, lines, or notes that may safely be taken without permission.”

    http://www.copyright.gov/fls/fl102.html

  • three blind mice

    Second, Google does not show more than two or three sentences without the author’s permission. And that’s not all. If a copyright holder chooses not to participate in Google Book Search, not a single word from the book will appear in any searches.

    irrelevant. googe makes a complete (unauthorized) copy in the cache. that is, and has always been, the bone of contention. maybe authors need to start a group fact checking google’s lies, oops, mistakes. (why doesn’t the HTML strike tag work?)

    all google has to do is index printed books ONLY where they have the approval of the author and there is no longer any dispute. placing the affirmative burden on authors sets an evil precedent.

    google is a massive, hugely rich corporation who only wants to get bigger and richer! why put the burden on authors, most of whom (ourselves included) make less per hour from writing books than a walmart associate makes in a month?

    you know it’s funny, professor. normally in the battle between david and goliath, common sympathy lies with david. what makes so many people, yourself included, abandon david and embrace the philistines?

    *mice check sling, cast furtive glances for suitable stones.*

  • Kevin Wright

    1. If ‘three blind mice’ is an author, why don’t the sentences start with capitals?

    2. A Walmart associate might make around $8/hr * 40 hrs/wk * 4 weeks/month = $1280/month. It is certainly reasonable for most authors to make less than $1280 per hour. (according to the literal words of ‘three blind mice’.)

    3. Years ago there was a debate about whether ‘copying’ a program off of a disk and into a computer’s memory was ‘fair use’. The answer seems perfectly obvious now. I doubt if Google’s action of copying an entire book into memory is the real issue. The distribution of that content is more likely to be the center of litigation.

    4. The following link claims that Amazon has seen a 9% increase in sales of books that have ‘search in the book’.
    http://www.infotoday.com/online/mar04/banks.shtml
    Google is following the footsteps of Amazon. If you want to stop Google, you probably want to stop Amazon as well. But then, your income might drop to a typical wage of the Philistine era.

  • Bruce

    Larry, a simple Google search reveals that Cheever is probably referring to fair dealing under U.K. law in referring to a 400-word limit. See this page from Liverpool John Moores University. The Copyright, Designs and Patents Act 1988 does not itself appear to set forth a word limit; other sites refer to the 400-word threshold as a “generally accepted limit” or “guideline.” I do not know what status that has under U.K. law. But she’s not just making stuff up. There are also “guidelines” in U.S. copyright practice, e.g. Circular 21 from the Copyright Office on educational use, but they are broader (e.g. 1,000 words or 10% of the work, but in any event no less than 500 words) and not binding, at least on the upper end.

    I think it only inflames the irrational passions I thought you were in favor of getting rid of when you accuse people of being liars, rather than attempting to figure out what the source of their error might be. I don’t think Cheever is British, but she seems more familiar with U.K. law for some reason (e.g., the 1710 statute she refers to, which is the Statute of Anne), and not so familiar with U.S. law (it’s the Copyright Act, not the Federal Copyright Statute, and it was passed in 1976, taking effect in 1978, neither of which is the “1977″ cited in the article). Perhaps she has previously made clear she actually knows all this stuff by heart and likes to mislead people, but unless you have evidence of that, why accuse her of lying? Let’s be charitable.

  • poptones

    That “cache” in the google book search example is not a “cahce” at all in the computer science context, which is where you are arguing. That “cache” is, in this case, a “cache” as in “cache of weapons” or “cache of drugs” or “cache of cash” – ie it’s a massive store or collection fo something. It is not merely dat a being “cached” in this case simply because what google wants to index does not yet exist as data – it exists only in physical form, on printed pages that are bound and collected in volumes – “volumes,” again, not being used here in the computer science context of “mount the volume” or “format the volume” but in the very real book sense.

    This is not a crawler that indexes already accessible information. Most of these books don’t exist in electronic form and essentially NONE of them, save the works that have gone into the public domain, exist online. There is no work to “cache” – google wants to create and store that data, and they will copy entire existing work in order to do so.

    The publishers have every right in the world to resist this and it is not just their interests that are well served in doing so. if the book publishers felt they had a reasonably trustworthy system of rights management then they would, most certain;ly, offer this stuff online, for doing so would mean more profits for them.

    But they don’t have that system. We don’t yet have that system. And they, and we, together have the incentive (wanting access to that work in this form) to foster creation of that technology. Usurping those works for google sets a precedent of diluting the protections afforded creative works in the electronic space. In every way this disincentivizes the creation of new works and even the digitization of those legacy works. Maing an exception of “its ok for google” does not reward the common man – us “commoners” are the source of the next generation of creative works and without a means of enforcing our rights in this space the only ones who will benefit will be those who already control the primary channels of communicationbs.

    If we had a system of DRM in place for written works it would not only have application to every other type of creative work, it would also give publishers a means to make their books available for “caching” and indexing in exactly the same way google already does – ie they could make their entire works available online as they see fit while allowing favored search engines (as is already the standard) to crawl their content for the purpose of “indexing.” Google would then have no need to “scan and store” these works – publishers would do that part for them while making available access to the electronic data google needs in order to provide this potentially valuable service.

    This entire debate fulminates on the lack of proper drm. Rather than remain silent or espouse the dogma of classical socialism, any organization that calls itself a defender of freedom in this marketplace should be at the fore in educating the proletariat and in initiating meaningful dialogs directed toward a fair and universally accessible digital rights management system.

  • http://sethf.com/ Seth Finkelstein

    Some publisher’s guidelines for authors seem to say “400 words”, that’s probably where she got the idea.

    http://www.intl.elsevierhealth.com/authorguide/copyright.cfm

    “You may use short, direct quotations without the need to obtain written permission from the copyright holder provided that you give proper credit to the author and sources. We define ‘fair use’ as excerpts under 400 words (or a series of excerpts totalling fewer than 800 words as long as no single excerpt is longer than 300 words) from one work.”

    http://www.blackwellpublishing.com/authors/permission.asp
    In the US, the ‘fair-use’ convention is generally taken as allowing one to quote up to a total of 400 words from a book, or 50 words or less from an article or chapter in an anthology”

    These are of course wrong. But my guess is she read something like this from a publisher, and didn’t research further.

  • three blind mice

    1. If ‘three blind mice’ is an author, why don’t the sentences start with capitals?

    artistic license.

    2. A Walmart associate might make around $8/hr * 40 hrs/wk * 4 weeks/month = $1280/month. It is certainly reasonable for most authors to make less than $1280 per hour. (according to the literal words of ‘three blind mice’.)

    there you go. that’s what happens when we don’t pay for a proofreader. we spent one year, and 1000+ hours researching and writing our last book. it sold 500 odd copies – which was actually more than we expected. our “take” was about five large, or $5 per hour, or what is it? ten google shares. we have little sympathy for the billionaires at google.

    3. Years ago there was a debate about whether ‘copying’ a program off of a disk and into a computer’s memory was ‘fair use’. The answer seems perfectly obvious now.

    not the same thing as scanning printed matter. not even close.

    the google book search may well be a great thing to increase sales – and if so then google should have no problem obtaining the permission of authors and publishers who want to participate. the problem is google deciding for themselves to copy any book without the author’s permission.

    it’s not fair use, it’s theft. google is not a public library. what happens when google links a book to their advertising? we want some control over how the fruit of our labor is exploited by others. is this such an unreasonable position?

    poptones, we think publishing a book on paper is ALL the DRM that is needed in this case. haven’t you place rocks-sizzors-paper? paper rights management covers digital rights management.

  • Josh Stratton

    poptones–
    This is not a crawler that indexes already accessible information. Most of these books don’t exist in electronic form and essentially NONE of them, save the works that have gone into the public domain, exist online. There is no work to “cache” – google wants to create and store that data, and they will copy entire existing work in order to do so.

    The issue of what medium a work is on is largely irrelevant. It might of interest as to whether there is an implied license for caching, due to the necessities of the medium, it might be of interest given particular statutory exceptions, but it’s not relevant in a fair use analysis. The work exists in some fixed form, and Google seeks to cache it for a transformative, non-competitive purpose. How would it change things if Google did so by reprinting the world’s books in a big paper volume, thus avoiding the change in medium? It wouldn’t. Come up with something better to object about.

    But they don’t have that system. We don’t yet have that system.

    And hopefully, we never will. Is there any problem in the world that you don’t think can be magically solved by some imaginary, half-baked DRM system? If it is a panacea, I’ve had some clogging in the kitchen sink lately, and I can’t imagine that totally breaking the copyright quid pro quo would be good for much else.

    educating the proletariat and in initiating meaningful dialogs

    Yes, and it’s so common to have meaningful dialogs with people that still seem to enjoy marxism.

    Three–
    we spent one year, and 1000+ hours researching and writing our last book. it sold 500 odd copies – which was actually more than we expected. our “take” was about five large, or $5 per hour

    Well, maybe you ought to switch professions. It would be nice for everyone who could engage in authorship to do so, but it’s expected that you’ll act in what you believe is your own best interest (which is not measured wholly in money, though money plays a large role), just as you should expect others to act in what they feel are their best interests. If you want to keep writing, then that’s great, but I would hardly be upset or surprised if you didn’t. In any event, it’s up to you.

    we want some control over how the fruit of our labor is exploited by others. is this such an unreasonable position?

    Maybe. I mean, if you’d write books in either event, giving you control deprives the rest of us of something valuable to us and we don’t receive some other benefit. If not, then it might be worthwhile, but options need to be weighed. Like I said, everyone’s acting in their own self-interest. It can work out well for everyone, but it’s not guaranteed to, and there is an element of majority rules.

  • poptones

    Yes, and it’s so common to have meaningful dialogs with people that still seem to enjoy marxism.

    See, this is absurd and why you are doomed to marginilazation – it is YOUR “model” that sustains control of the elite over the masses, that leaves everyone with no means of ownership outside the blessing of those “in charge.”

    Duh. Socialist democracies have failed to compete in every case. That doesn’t mean government has no role in governance, but it is a damn good sign stripping away right of ownership from individuals is a not so unique path to failure.

    The issue of what medium a work is on is largely irrelevant…. in a fair use analysis

    It is pretty obvious I know more about this as atechnical matter than you. It’s also seeming increasingly obvious you don’t really know that much about the law. Rather than continue phrasing this as an attack, here’s the point:

    prove it.

    I don’t think you can. I don’t think you can because the internet is not radio and it is not paper rolls and it is not a vcr. The issue is not so black and white as your dogmatic approach (were it at all credible) would have us believe.

    DRM provides choice. No DRM means no choice. Who put you in charge of the decision? Who gave you or anyoen else the right?

    No one ever did. DRM will evolve and if you refuse to accept this completely inevitable fact you will have nothing left to debate. You wil just be one more anonymous kook screaming about the old days when everything was right with the world because people back then couldn’t engage in private commerce on a global scale and the tyranny of geography afforded you the luxury of refusing to compete in this world.

  • ACS

    3. Years ago there was a debate about whether ‘copying’ a program off of a disk and into a computer’s memory was ‘fair use’. The answer seems perfectly obvious now. I doubt if Google’s action of copying an entire book into memory is the real issue. The distribution of that content is more likely to be the center of litigation.

    Kevin – Oh man you are so far off it is painful.

    I know the Australian case you are referring to, but what is the american case – for the file.

    In any event, the copying of copyright material and distribution of that material both consitute an infringement of copyright. On this basis we must consider both ends, if you will, with respect to fair use.

    Fair Use of course is a purposive defence such that if a person uses copyright material vested in another person for a “Fair Use” then thier conduct will not be an infringement.

    Regarding the distribution of copyright materials, it is clear that communication of a small portion of a work for illustration of a point or as a factual or educational excercise will fall within the fair use provisions of the US Act. If people search a database to retrieve information regarding books or where a quote comes from then it is fair use.

    Alternatively the reproduction of books into digital form is for the purpose of creating a database that may be searched by a robot on a commercial basis or at the very least by accessing a web site that profits from users. An entire copy of the book must be made in digital format. Is that copying covered by Fair Use. The answer is probably no – because there is no present purpose for making the copy. I cannot make an infringing copy of a work now and argue that I might use it for a purpose that qualifies it for a defence of ‘fair use’ later on.

    Google is the middleman who should get caught.

    As far as I understand it – a fair use defence over one infringement of copyright material will not extend as a defence to separate rights in the copyright material that are infringed by separate acts.

    Those are just some of the basic reasons why google should stop preparing for the largest single act of infringement ever performed.

  • poptones

    poptones, we think publishing a book on paper is ALL the DRM that is needed in this case.

    It’s a good metaphor for the argument this is not a fair use – just try, for example, to get the folks at kinkos to copy an entire book for you. Even a single copy “for personal use” that you will never share with anyone… “really, I promise.”

    Paper volumes, however, carry with them an entire burden that is the point of the internet and this means of communication. To say we do no tneed DRM because the old way isa good enough is to throw the baby out with the bathwater.

    There are already people in korea being paid real life money for their labors in the game realm: the workers are once again separated from the means of production and the field already set for exploitation and coercion – “the dictatorship of the proletariat.” To exclude this binary proletariat from the same opportunities enjoyed by those who control the channels of communication is to inflict tyranny upon a media which is, allegedly, inherently democratic.

  • ACS

    socialist

    adj 1: of or relating to or promoting or practicing socialism; “socialist theory”; “socialist realism”; “asocialist party” [syn: socialistic] 2: advocating or following the socialist principles; “socialistic government” [syn: socialistic] [ant: capitalistic] n : a political advocate of socialism (see above)

  • http://k.lenz.name/LB Karl-Friedrich Lenz

    The error is of course on your and Google’s side.

    Google is displaying all pages of all books to the totality of searchers. That they only display a couple of lines (much more meaningful than “sentences”, since sentences can be two words or two hundred words long) to individual searchers is irrelevant, since we are talking about Google’s use, not that of individual searchers.

    While this error is not new, the last sentence is interesting. I thought that opting out meant that Google kept their fingers off the work completely. In contrast, this might be read that Google still includes works opted out in the database and in search results, but just displays no snippets for them. That would explain what happens when you search for “Supreme Court Sony”.

    As I noted on my blog yesterday, one of the results on the first page (owned by Lexis) displays no pages and no snippets. I was somewhat puzzled by that, but this might be the logical explanation.

  • three blind mice

    First, no such 400-word rule exists.

    true.

    Indeed, in some cases courts have ruled that copying and republishing the entire work is fair use.

    and as reference for this, professor Lessig links his apple of truth to the orange of OPG, et al. v. Diebold.

    this seems grossly misleading professor. first, as you correctly intimate, fair use is not simply a matter of a specific number of words – or as professor Lenz suggests – a number of lines. ms cheever is indeed wrong about this and she deserves to be called on it.

    but saying that “in some cases courts have ruled that copying and republishing the entire work is fair use” in relation to the (erroneous) 400 word limit, is IOHO far more misleading. the number of words in the diebold case was hardly the issue on which the decision turned. as the court observed in its decision,

    The purpose, nature of use, and the effect of use upon the potential market for or value of the copyrighted work all indicated that at least a part of the e-mail archive is not protected by copyright.

    we repeat,

    1. purpose,
    2. nature of use,
    3. and effect of use upon the potential market.

    1. purpose: the diebold case involved “leaked internal documents revealing flaws in Diebold’s e-voting machines.” as the wikipedia entry notes,

    The GEMS software, certified by NASED via Ciber Labs employee Shawn Southworth of Hunstville, AB is at the center of an alleged Diebold Election Systems electoral fraud, 2004 that is much more serious than the previous allegations in the U.S. presidential election, 2000 and U.S. midterm election, 2002 in which Diebold also came under scrutiny.

    the accused copyright infringement was not done to further the profits of a commercial enterprise, it was to make the public aware of technical problems in voting machines. it seems to us that the issue of electoral fraud is something rather more in the “public interest” than google’s profits. as the court observed, “It is hard to imagine a discussion of which could be more in the public interest.”

    2. nature of use: as the court noted the use,

    was transformative: they used the email archive to support criticism that is in the public interest, not to develop electronic voting technology.

    3. and effect of use: the court noted that

    Diebold has identified no specific commercial purpose or interest affected by the publication of the email archive, and there is no evidence that such publication actually had or may have any effect on the putative market value.

    the court concluded that:

    No reasonable copyright holder could have believed that the portions of the e-mail archive discussing possible technical problems with Diebold’s voting machines were protected by copyright….

    adding for emphasis that the evidence,

    suggests strongly that Diebold sought to use the DMCA’s safe harbor provisions – which were designed to protect ISPs, not copyright holders – as a sword to suppress publication of embarassing content rahter than as a shield to protect its intellectual property.

    google’s claim of “fair use” for its google book swipe appears to us mice to fail (miserably) on all three of the tests: purpose, nature of use, and effect of use upon the potential market. the number of words they steal seems irrelevant.

  • http://k.lenz.name/LB Karl-Friedrich Lenz

    I don’t think I suggested that fair use turns only on the amount and substantiality of the portion used, which would be quite obviously wrong, since that factor is only one of four listed in Section 107.

    I only said that the amount in this case is _not_ only four lines.

    For anyone believing the opposite, I have one simple question.

    Can you tell me _which_ four lines of any given work Google is using?

  • three blind mice

    I don’t think I suggested that fair use turns only on the amount and substantiality of the portion used, which would be quite obviously wrong, since that factor is only one of four listed in Section 107.

    no professor Lenz, you did not. our post wasn’t entirely clear on this point and we apologize for that. you did, however, point out (correctly) that lines and words could mean substantially different things.

    Can you tell me _which_ four lines of any given work Google is using?

    great question. that’s really the issue, isn’t it? google’s use – and an individual’s use – of google book swipe are two rather different things. whilst a user might only be shown a few lines, it is clear that google’s computers use all of the lines (and words.) and as you pointed out above, for the search to be meaningful, the individual user has to have access (via google’s search engine) to the complete text.

    that an individual user’s actions might not be considered as copyright infringement (hey, it’s JUST a few lines) is not a very persuasive argument. one cannot isolate the user’s actions from google’s actions.

  • nate

    Mice:

    1) You seem to imply that the ‘purpose’ by which we should judge for fair use is synonymous with the benefit to Google as a corporation. To me, there is a clear societal benefit as well that should be considered as well. Are you saying that this societal benefit is legally irrelevant, or do you simply see no such benefit?

    2) You argue earlier that the real problem is the opt-out nature of Google’s program. You also argue that this is an undue burden on authors. I would have guessed that in most cases this would be handled by the books publisher. Do you feel it is also an undue burden on a professional publisher to opt-out?

    –nate

  • anon

    the easiest way to deduce Susan’s source is to ask her.. so I did ;)

    it’s “the legal department of Random House and the Author’s Guild”

  • three blind mice

    1) You seem to imply that the ‘purpose’ by which we should judge for fair use is synonymous with the benefit to Google as a corporation. To me, there is a clear societal benefit as well that should be considered as well. Are you saying that this societal benefit is legally irrelevant, or do you simply see no such benefit?

    well, nate, we think you have to look at all sides – the impact on authors, the financial benefits to google, and the societal benefit. we argue on the side of david – the authors – who are fighting for their rights against powerful, wealthy corporations and the tyranny of the majority. society does not benefit at all if there are no new books for google and the public to steal.

    2) You argue earlier that the real problem is the opt-out nature of Google’s program. You also argue that this is an undue burden on authors. I would have guessed that in most cases this would be handled by the books publisher. Do you feel it is also an undue burden on a professional publisher to opt-out?

    yes. the burden must be on google. entirely.

    it is naïve, at best, to imagine that google cares one bit about “societal benefit” and it is amusing to see so many people shilling for google shareholders. especially when they shill for free. it is the best. outsourcing. ever.

  • ACS

    Lessig states in the intro:-

    If a copyright holder chooses not to participate in Google Book Search, not a single word from the book will appear in any searches.

    Now this seems a little bit mischievious to me. It could be interpreted that Google will include works in book search unless the author chooses for his works not to be included.

    Please advise whether inclusion of works in the Google Book Search is contingent on obtaining the authors consent.

  • nate

    ACS:

    I can’t tell if I’m missing something in your question, but yes, Google is proposing an opt-out system whereby works would be included in the database unless the copyright holder (often not the same as the author) actively declines. Thus the inclusion of works in the Google Book Search is _not_ contigent on obtaining the copyright holders consent, but rather on the absence of refusal; thus this debate.

    mice:

    Thanks for your response. Although I disagree with you almost word-for-word, it’s a good reality check for me to see why smart people hold contrary positions.

    –nate

  • three blind mice

    Thanks for your response. Although I disagree with you almost word-for-word, it’s a good reality check for me to see why smart people hold contrary positions.

    and thanks for your response nate. there is no “right” answer here which is why the sort of intelligent, civil debate we have here is so valuable. our host is providing all of us with a valuable forum.

    please, do not ever think that your opinions do not have an influence on us mice. seeing-impaired as we are, we are groping forward the best we can in the darkness and everyone’s point of view helps. although we may seem arrogantly self-assured at times, the only thing we know for certain is how much we don’t know.

  • ACS

    yes, Google is proposing an opt-out system whereby works would be included in the database unless the copyright holder (often not the same as the author) actively declines. Thus the inclusion of works in the Google Book Search is _not_ contigent on obtaining the copyright holders consent, but rather on the absence of refusal; thus this debate.

    If this statement is correct then it is certain that Google will copy data into its database without licence of the authors. The mere publication of information cannot be considered to be an implied licence when such publication is done for reward. That would be just about every commercial book ever written since Dickens (Well not in the US anyway).

    This narrows the issue significantly, it is now clear that if Google relies on fair use it must establish that fair use at the point of copying the book. I note the Mice’s post on Feb 10 with regard to GooglePrint and the Diebold analysis and agree with his reasons for not allowing fair use in respect of Google Print. I would hasten to add that the purpose of the copying is not educational as it is, immediately, merely for the purpose of filling the Google database.

    On the basis of this opinion, which is reasonable in the circumstances, I would advise Google Print to require authors consent before copying material to its database. I would also note that Google probably has an impied licence to all material provided on the internet or to the public in a digital form and by other means which Google may have the fair use right to copy.

    That seems fairly conclusive really.

  • http://dasht-brk.livejournal.com/ Thomas Lord

    Larry:

    Would it violate some copyrights if I were to go to the library and start photocopying every page of every book that interested me? In what way is Google’s program of action different from that?

    -t

  • http://www.facebook.com/ Navid

    Gee whiz, and I thgouht this would be hard to find out.

  • http://jxmdxpjzfdyu.com/ nkfflm

    hh5ec3 utmxfpmipaly

  • http://dmubkwsutswl.com/ djqjrdbcsg

    kU4BJG , [url=http://effveexgnjnm.com/]effveexgnjnm[/url], [link=http://uaqfzweqhoqu.com/]uaqfzweqhoqu[/link], http://bdfbtghqcrno.com/