July 21, 2005  ·  Cass Sunstein

On information aggregation, I haven’t yet said anything about open source software (though some comments refer to it). But to an outsider, OSS does exceptionally well in incorporating the ideas of numerous people. It’s analogous to the most optimistic understanding of Wikipedia (yes?). Here are the ridiculously ignorant outsiders’ queries, with apologies for the ignorance: Does OSS do as well as it seems in aggregating dispersed information (and dispersed creativity)? If so, why? If not, why not? It’s hard to have an adequate understanding of how information aggregation can go well, or badly, without having some answers. (I’ve read and learned a ton from Eric Raymond, Larry L., and several others, but the questions are not entirely answered.)

  • Corey

    I used to be a computer engineer, now I am a law student…

    One obvious difference with software vs. political or economic theories on a blog or entries in a wiki is that software has to work/be correct.

    If software is wrong, it crashes or slows down the computer in an obvious way, giving immediate feedback. There is, as Perl programmers might say, always “more than one way to do it”, but there is a defined measurable end result in each case. (Add a feature, make a feature run with 50% less CPU cycles, etc…)

    In contrast, many political discussions can go around for decades while a few brave souls attempt to inject empirical data that is immediately deconstructed (perhaps in order for the discussion to perpetuate itself)

    You could perhaps draw a useful analogy to various bugtracker forums (i.e. bugzilla) where people report and discuss problems. But again, the right answer (the system works as planned) is much easier to evaluate in the software realm.

    The big thing OSS proves is that people will often do large amounts of very difficult engineering work for no other compensation beyond self-satisfaction and the thanks of their peers. I think it says more about the need for IP or massive executive compensation as a supposed “incentive” towards innovation than it does about information or talent aggregation.

  • http://www.airs.com/ian/ Ian Lance Taylor

    Free software is a good mechanism for aggregating the work of many different people. Non-free software exists in a world of artificial boundaries, which prevent people who are otherwise capable and interested from improving it. Free software removes those boundaries.

    But a working software program performs a relatively well defined task in a relatively well defined manner, and the choice of task and implementation strategy come from a single programmer, or at most a small group. While it is reasonably common for a free software program to improve steadily, it is extremely rare for it to change radically what it does or, in broad terms, how it does it.

    It’s true that fresh re-implementation of free software projects happens frequently enough–but that happens with non-free software as well. There is a respectable amount of free software which is simply reimplementations of proprietary programs.

    So is free software a good mechanism for gathering together the creativity and intelligence of a wide range of people? Within a very specific area, it is. But free software by itself is not a good mechanism for distributed creativity–for breaking the mold and inventing something new. Once somebody has invented something new, free software is a good way to distribute it. But free software by itself doesn’t help with the invention.

    (I’ve been writing free software for long enough that I still call it “free software,” not “open source.”)

  • Tim WU

    Cass,

    This and your other postings have led me to want to try and figure out if there’s a difference between a classic Hayekian mechanism, like prices, and other modern means of aggregating information, such as blogs, eBay, Wikis and free software projects.

    In Hayek’s model, a large group of people have two things: (1) private information (the highest price they’ll pay) and (2) a reason to want to reveal some of that information (they need the good).

    So as you already pointed out, much centers on (2) the incentive to transmit the information in question. And this takes us to very familiar ground for IP scholars, who have spent the last decade trying to understand the incentives that drive participation in OOS, Wiki, and other projects.

    But what people don’t understand — and what I think we need to understand — is the relationship between (1) the nature of the incentive structure, and (2) and the effectiveness of the system in question as a tool of information aggregation. I’m sure this comment isn’t quite the place to answer these questions, but at least we can see what the problem is.

    Let me take an easy example that the literature already understands. When I buy a melon to eat, I am just paying based on how much I need or, or think I’ll enjoy it, and no one has better access to that information than me. The result of that system of information aggregation should be pretty good, because I know how much I value the comic. But if I buy a comic as an investment, I’m buying it because I want to make money and think that other people will think it valuable. The incentive to reveal the information is different, and my information is much less accurate. Hence the problems like inflated, fake markets for comics.

    So let’s see how this helps us understand coding, blogging, etc.

    Blogging may probably the worst internet system for information aggregation out there. Blogger’s access to information isn’t particularly good — bloggers have opinions for sure, but usually less private information than reporters. And their main reason to reveal the information is some hope of fame, glory and great masses of readers, which admittedly does make bloggers want to distribute information, but not necessarily accurate information.

    That doesn’t mean blogs might not be interesting for analysis, or other purposes, but as a tool for aggregating accurate information, they seem quite weak.

    Wiki and coding are a little different. if I decide to help code open office, on the one hand, my information may be very good, as presumably I choose something I can fix. But I am revealing information for a variety of complex reasons that people are studying — maybe I hate Microsoft, maybe its the some hope of glory, maybe I’m bored. The problem is, as Taylor’s post suggested, is that its not clear that these incentives will lead people to reveal as much information as other mechanisms.

    Well there is too much to say here. I want to note finally that the best info aggregation system is surely eBay of the internet age. In fact, eBay has made it very hard to get a deal any more. In general, we should expect every one of these systems of information aggregation to be seriously flawed — not that that makes them worthless.

  • http://soy.dyndns.org/~peter Peter Boothe

    Computer Science as a discipline is still figuring out the best way to do code sharing. We have some okay techniques for spreading ideas around (websites, tutorials, academia, conferences, …), but building on another person’s system to improve it and/or incorporating aspects of their system into yours can be very very difficult. Certainly it’s not nearly as easy as modifying and extending an entry of Wikipedia is. So we get into a situation where little problems get fixed relatively simply and quickly, but big changes take a much longer time – often far out of proportion to the actual difficulty of the problem, simply because the time investment required to make that change or incorporate that system crosses some sort of internal “I don’t have time for THAT” threshold. Ideas that would take more than a day or two for an experienced software engineer or more than a week or two for a fledgling student are simply removed from consideration unless the need is very great.

    For smaller projects, and web based projects in particular, OSS is often a huge win. When projects are small, ideas are easily exchanged and incorporated. For larger projects, and especially projects with a very large user-interface component, OSS will often lag behind commercial software. The reasons for this are myriad, subtle, and extremely contentious, but I’m pretty sure that a large chunk of it is that making a good GUI is extremely difficult and needs some nutball designer with a dream and a unified vision pushing a stable of developers to make difficult changes in response to lots and lots of user tests. (i.e. how Apple does it)

    The shortest answer I can give is that the systems that developers use every day are the ones that are usually best in OSS, and the more that a system is for “end users” and not “developers”, the more it will tend to stagnate in OSS-land.

  • http://www.airs.com/ian/ Ian Lance Taylor

    I agree with Peter Boothe that free software projects with a large UI component tend to lag. I also agree that the kind of changes that get made are the ones which don’t take an inordinate amount of time. I just want to point out that the former is really just a consequence of the latter, in that in current technology UI changes are difficult. If our development tools were such that UI changes were easy to implement, then in fact free software programs would have the best UI, because it would be easy for anybody to polish them.

    I think that is a good general point about free software. It does not serve as a particularly good mechanism for aggregating information because, at least at the present time, programming is hard.

  • poptones

    But you are overlooking all those “conventional” motivations that also drive OSS now. For example, the gnome desktop: it’s an OSS project that has a significant number of corporate participnts and it has one of those “nutjobs with a vision” at least claiming to have done UI tests and shaping the desktop design based on those elements of test and vision.

    There are a lot of people doing OSS work for a paycheck. And some of the people doing OSS development who aren’t doing it for a paycheck are trying to be good team members who will hopefully become team leaders who will hopefully become paid team leaders.

    OSS in this regard provides everyone with the incentive and the power to act on their own behalf while providing a quantifiable and reasonably objective means for others to evaluate their performance. Rather than searching for an internship, make your own. “Code talks, bullshit walks.”

  • http://en.wikipedia.org/wiki/User:Jamesday James Day

    Open source software (software where the source code is available) can help because more people can look at it. It depends greatly on the interest of a community and the nature of the relationship between the primary supplier and the end users.

    If you have a primary supplier who is doing traditional development, the product being open source is useful only to the extent that customers have a need or desire to read what they have purchased, because any suggestions or changes they make may not be incorporated.

    If instead it is community developed software, perhaps either free or copyleft, then it seems to depend on the management of the project. If those doing the managing are open to new ideas and the contributions of others, that project is likely to be very successful. If instead they are working together as a team and don’t accept much input from others, then the opposite would be true.

    So, I’d say that the most prominent open source, free and copyleft projects have been succesful more because of the attitudes and methods of their managers than because of any fundamental attribute of a licensing model. Then those attitudes help to choose the licensing model, which is quite likely to be open source, free or copyleft because they are likely to want to encourage the participation and shared ownership of others.

    The Wikipedia encyclopedia has been an example where, particularly at the lowest levels, very broad participation is accepted and encouraged.

    That broad participation can have drawbacks. Some time ago in the English language encyclopedia someone added a “disclaimer” to the multimedia a template for the GFDL license. The GFDL says that if a disclaimer clause is present it must be retained. Since then all GFDL contributors who have followed instructions have been requiring that all users of their GFDL work, including the Wikipedia encyclopedia primary host at wikipedia.org, incorporate their disclaimer with all uses of their work, somewhat undermining the desire to have the project use a plain GFDL license. I assume they just didn’t know what they were doing but it’ll take substantial work to undo without breaching the licenses which have been granted.

  • Peter

    I definitely think OSS is a decent information aggregator. The key is that all software comes from source code, which is just a bunch of text files that are human (er, geek) readable. The contents of that source code direct absolutely everything about how the software works. In this sense, OSS is similar to the price system, in that there is one final end result of the pooled knowledge. Since anyone is allowed to suggest changes, you can in theory have a lot of programming experience contributed to one software project – more so than in the closed source world.

    Nonetheless, there are still barriers. As another commenter pointed out, you do need the software to actually work. That means every last semicolon needs to be in the right place, which prevents the original author from just taking anyone’s submissions for “improvement” without review.

    Moreover, since review of submissions is required, you subject the OSS “system” to human error and fallibility. For example, you might have an arrogant developer who doesn’t want to do things another way, even if he probably should. Linux (the kernel) relies on a “benevolent dictatorship” – Linus will accept good patches and ideas, with relatively little ego involved. But Linus is an exceptional person in that regard.

  • http://www.freedom-to-tinker.com Ed Felten

    In response to Tim Wu’s comment:

    Blogs do very well at one kind of information aggregation: finding needles in a haystack. If an issue seems important, and there is a super-obscure piece of information that will shed light on it, blogs do a pretty good job of finding and disseminating that information. Blogs are good at this kind of thing because there are so many bloggers, and information percolates rapidly from blog to blog.

    Consider how blogs debunked CBS’s bogus memos about the president’s military service. The debunking relied on detailed knowledge about the fonts on 1960s typewriters, clerical practices in the National Guard, and so on. That kind of information is hard for conventional news reporters to track down. But somewhere there was a blogger who knew; and the information flooded quickly through the blogosphere.

  • Paul G

    I think the idea of invention is a good way of distinguishing Wikipedia from OSS.

    Wikipedia is simply an experiment in information aggregation. It is very successful at drawing together, defining and presenting disperse pieces of information. But in terms of the information it contends with, it is not inventive. It’s not creating the future; it’s simply reporting on the past.

    Whilst I note Taylor’s comment that many OSS programmers are rehashing existing ideas, there is also a substantial degree of invention taking place. New ways of doing things. And without wishing to dive head first into the realm of romantic authorship, I would suggest that collective invention or dispersed invention – whatever you want to call it – is far harder to coordinate that collective information aggregation. One could even posit that the the degree of invention taking place in a given project is in inverse proportion to the number of participants in the project. But that’s just a hunch!

    The clearest similarities between Wikipedia and OSS are in the motivations of participants. I’ll leave this one for now but, for those who haven’t read it, I would wholeheartedly recommend Weber’s ‘The Success of Open Source’.

  • Corey

    “Blogger’s access to information isn’t particularly good — bloggers have opinions for sure, but usually less private information than reporters. And their main reason to reveal the information is some hope of fame, glory and great masses of readers”

    All you are doing there is expressing a preference for traditional reporters. The only reason a reporter might have more information is that they work for a company with the resources to sponsor their research or send them location.

    But many bloggers are experts in the fields they write on and many live in the locations where events are occuring. In that sense, they are likely to know more than a visiting reporter with less experience of the subject. Blogs cast their nets wide, rather than depending on a what a few reporters might dig up, they seek to attract knowledge to themselves and filter for truth.

    There may be an incentive to spread rumors on blogs in order to get fame and glory, but as Stephen Glass and others have recently proven, traditional reporters will sometimes make up stories as well. Also, because reporters are financed by media conglomerates, they are subject to story approval and “editing” in the interests of marketability. Lessig’s “Free Culture” book mentions an instance during the Iraq war where the New York office rewrote a field reporter’s story that was “not optimistic enough”.

    The prime issue with blogs is verification. The norm is not to cite sources and many readers don’t take the time to use google or wiki to check controversial assertions. However, increasingly reporters cite confidential sources and viewers show no more tendency to check their work either.

    Note how with OSS, verification is built in. Wrong answers break the code or slow it down.

  • Corey

    “Linux (the kernel) relies on a “benevolent dictatorship”"

    It may be true that all successful OSS projects rely
    on a benevolent tyrant (usually the original developer).
    But the same thing can be said of Wikis and comment-enabled Blogs. There is in each case a person who potentially holds veto rights over new submissions. With Wikis that power is dispersed, but there is still one person who can change rules or take the server down if they decided to.

    I think that because all of these “information aggregators” rely on non-compensated labor for survival, there is a natural incentive for the appointed tyrant to be “benevolent”. Arrogant, selfish, or abusive people do not attract donations of time or information. You usually have to pay for the privilege of exploiting people.

    The exception would be a strong charismatic personality that could attract free “workers” with low self-esteem on the cult model. I haven’t been around OSS communities enough to know how prevelant coder cults are.

  • Dan Margolis

    In browsing the comments (briefly, admittedly), I don’t see many people more than touching on what I believe is an important factor in the success of open source projects, especially larger ones–how to deal with identifying talent. Since most projects rely on self-identification for tasks–volunteering because “Hey, I’m pretty good at this, can I help?”–they face the natural problem of weeding out those who are bad at identifying their own strengths (an extremely common situation).

    Solutions range from formalized entrace quizzes before one can contribute to the project (Gentoo, for instance, has one of these, simply to try to weed out the hopelessly incompetent) to requiring that trusted contributors earn that trust through a long period of useful smaller contributions to simply trusting individuals to not screw things up. But I think there’s a deficiency in many of the current large projects in that they perhaps understand and try to compensate for technical deficiencies, but they don’t understand or address organizational or mangagerial deficiencies in the way that a commercial organization typically does.

    In other words, open source projects, many of which have dozens or hundreds of active developers, rarely are very good at managing their resources or plotting their goals or setting priorities, which, of course, makes a lot of sense, since everyone is just a volunteer (usually).

    There isn’t any obvious solution that I see, and if there were, I doubt it would fit in this space, but I think it is at least interesting to note that the entire model is not just superficially more anarchic, but is sort of endemically so.