« (broadcast) flag burning | Main | the analog key »

the electronic library

Check out this story published online at Wired before in print about Amazon's amazing new searchable book archive. Amazingly cool.

| | technorati

TrackBack

TrackBack URL for this entry:
http://lessig.org/mt/mt-tb.cgi/1034

Listed below are links to weblogs that reference the electronic library:

» Amazon and the Authors Guild from Stephen Laniel's Unspecified Bunker
In a discussion of Amazon’s new Look Inside The Book program — which contains the full text of over 100,000 books — someone noted that the Authors Guild has protested: they claim that publishers don’t have the right to authorize th [Read More]

» Couldn’t be more different from The Lexiconomist
I’d like to provide you with three series of links, each of which should be given your full attention. First, what I consider to be the one pure sign that we’re making progress as a civilization— shotgun rules. Their only mistakes are [Read More]

» Couldn’t be more different from The Lexiconomist
I’d like to provide you with three series of links, each of which should be given your full attention. First, what I consider to be the one pure sign that we’re making progress as a civilization— shotgun rules. Their only mistakes are [Read More]

Comments (22)

To call this "amazingly cool" is an understatement. I lack the words to adequately describe it. The Wired author is wrong. I, too, remember what it was like to discover the Web 10 years ago, and this is way beyond that. The thought of being able to search the contents of millions of books boggles the mind. And if they follow this through to its logical conclusion? First it's every book in print. Then the out-of-print books. Then the "orphans". That covers most of the last century, which has been the most prolific in the history of publishing. So from there it's not too big of a leap to finish the job. Imagine a full-text search of every book ever printed. Now toss in on-demand printing of anything you want to read. My mind is so far beyond boggling that it's starting to melt down.
(Though I'd rather download to a reusable "electronic paper" book than print-on-demand.)

"Full text searching of every book ever printed!" Immagine the number of "hits" you would have to sift through to find anything useful...talk about boggling the mind.

At this point a well built catalog/index is infinitely more usefull than full text searching on a broad scale. But, hey, maybe it's just the librarian in me.

I like where this could go eventually, but I don't like that they're combining two forms of crippling: both a) limiting you to a fraction of the book's length, and b) releasing pages as graphics files rather than raw text. Either of those would be fine, but both is kind of frustrating.

I just tried it out. If this isn't the coolest thing ever invented, it's definitely in the top 10. 1450 hits on the word "matriarchy". Amazing!

Steve: I'm not thrilled with the limits, either. Especially since it also means downloaded sales are out of the question for the near term. But I've dealt with enough messy copyright situations on the fringes of the music industry that I greatly admire the elegant end-run they've made around the rights issues. In today's copyright law climate, we are pretty much stuck with either accepting limits like these (or worse!) or getting nothing at all. So until the law catches up to the 21st century, I'll be grateful for Amazon's relatively non-draconian compromise.

And now the Author's Guild is contending that publishers don't have the right to authorize the service without authors' consent. They feel many of the works in question will be "threatened" by being exposed in full-text form: "Most reference books would be at clear risk in such a database. So would many (if not most) travel books and cookbooks." I wonder if they've ever thought that allowing someone to sample recipes from a cookbook would motivate that individual to buy the book, when they never would've before?

The "abandonware" and "orphan" books talked about in the article (old books that can't be scanned because it's too expensive to figure out all the copyright clearances...even though they are not making money for anyone) - might be an interesting subject of future Creative Commons licensing efforts. I suppose it might also take some new legislation, but there are huge amounts of valuable info in old books.

October 26, 2003 2:36 PM Joseph Pietro Riolo:

Recall that the same organization Author's Guild
complained to Amazon for selling used books.
See http://www.authorsguild.org/oldsite/pramazon1200.html.

Does anybody know where Author's Guild stand in
respect to copyright term extension?

I am afraid that Author's Guild is acting like
MPAA. What will be next? That the authors
have the right to decide how many times you
can read a novel?

Joseph Pietro Riolo
<riolo@voicenet.com>

Public domain notice: I put all of my expressions in this
comment in the public domain.

October 26, 2003 3:27 PM Matthew Saroff:

On a number of boards, people are thinking that Amazon will get nailed over this in the same way that MP3.com did over its service.

It seems to me to basically duplicate thumbing through a book at a book store, but given the current copyright regime, I'm afraid that they may be right.

October 26, 2003 11:32 PM Patrick McDowell:

I'm sure word of this will spread very rapidly around Academia. Plagarism (or atleast getting away with it) is now extinct. This really is the nail that sealed the coffin that Google had crafted.

October 27, 2003 12:40 AM scott huminski:

[so scott huminski posted lots of stuff re Howard Dean that was not on topic. I've put it here.] Lessig

Very interesting, now all we need is a cozy chair and a cafe latte to download.

October 27, 2003 5:57 AM Matthew Saroff:

Ummmm......What the heck is with the Dean post?

Why would anyone in their right mind post that to a discussion of literary search tools?

October 27, 2003 6:06 AM Matthew Saroff:

A followup on my first comment:

This is an interesting phenomenon.

The Dean campaign has created a sort of "retail" campaign through the net, and now a number of people (this is not the only place I see this) have created "retail opposition research".

Speaking of Amazon.com, they have the pre-order page up for Professor Lessig's new book. Unfortunately, they need a better spellcheck.

-kd

As an experiment, I played around with this search insiode function in an attempt to string together many pages in a row (searching again with the last line from the last page Amazon allowed me to see). After about 15 minutes I have copied and saved (in POwerpoint) about 1/3 of the Lydia Davis book, SAMUEL JOHNSON IS INDIGNANT. It looks like it's possible (if laborious) to accumulate a complete copy of any text using this technique. This copy would be pretty easy to share and distribute. I love this feature, but it seems like it is pretty easy to subvert.

I love this feature, but it seems like it is pretty easy to subvert.
Well sure. Corporations should realize by now that their attempts to lock down technology -- while not fruitless -- are probably going to cost them more than leaving the technology open. All technologies are going to be hacked. Yes, there are legal defenses against hacking, but I would bet that the costs of enforcement and technology development necessary to "hackproof" their technologies are rather substantial. It seems like it'd be easier for companies to acknowledge the inevitable and just open up their technologies.

Then again, Amazon would face some pretty enormous political hurdles (with the guild and its own stockholders) if it made the full text available for free.

I tried to sign up for the Search Inside features yesterday only to notice this disclaimer:

By publishers' agreement, we are pleased to offer Amazon.com customers with a valid credit card the ability to view copyrighted pages.

Your account will not be charged.

This one-time process enables you to view limited copyrighted material through our Search Inside the Book feature.

Please enter your credit card information and its billing address below.

I generally don't like to have companies storing my credit card number for me, so I didn't actually complete the sign up. But I started thinking: why would publishers care that I have a valid credit card number? My first guess is that they may be restricting use of Search Inside to US citizens only, or at least people bound by US copyright laws. It might also be intended to reduce the threat of bots. Any other guesses, or am I just being paranoid?

I don't think this is similar to the advent of MP3's, mainly because there is still no simple technology available that can translate what we currently have (books) to what we want (digital text). Amazon is in a unique position because they are able to convice the publishers to turn over the original digital texts, presumably in return for some monetary payment or sales quotas, with the guarantee that they will protect the publishers/authors' copyrights to the best of their ability. If the publishers were ever to get uncomfortable with the setup, I doubt they would hesitate at all before deciding to pull the plug.

What bothers me is that public libraries have always been essential to free culture and democracy, but in today's world the libraries are increasingly privately controlled: Lexis/Nexis and now Amazon Search Inside. I hope that changes, but I'm not sure that it is likely.

Dana:
It might also be intended to reduce the threat of bots. Any other guesses, or am I just being paranoid?


The Wired article mentioned that you are limited in the number of pages you can view per month, and the percentage of a book you can view. My theory is that the CC# is part of an attempt to keep unethical people from bypassing those limits with multiple accounts.

Dana:
Amazon is in a unique position because they are able to convice the publishers to turn over the original digital texts, presumably in return for some monetary payment or sales quotas, with the guarantee that they will protect the publishers/authors’ copyrights to the best of their ability.

Actually, many (most?) of the books were scanned and OCR'ed, according to the Wired article. (I had noticed this for myself, before reading the article; I did a search for JHDL, and found mostly books on VHDL, where it was obvious that VHDL was mis-OCR'ed as JHDL.)

You should also check out the Project Gutenberg: http://promo.net/pg/. They also have quite an impressive collection available for free.

Post a comment

By entering the words in the box, you are also helping to digitize texts that were written before the computer age. The words that you see were taken directly from old texts that are being scanned and stored by the Internet Archive. This CAPTCHA helps proofread the books. If the sample is too hard to read, click the recycle button to get another two. A space between each word is required. And thanks for the comment and help.