May 7, 2005  ·  Lessig

So imagine you’ve got a PDF document. The document will go through different versions. You want to make sure the reader is reading the current version. Is there a way to make a PDF version aware? So if you open it on a machine connected to the net, it can either update itself, or warn you that the version you’re reading is out of date?

  • http://www.imperialviolet.org Adam Langley

    Yes, there is. PDF (with Acrobat reader) supports Javascript and that Javascript can phone home (see http://lwn.net/Articles/128950/).

    However, no open-source readers (e.g. ghostscript or xpdf based ones) will support it and even in acrobat it can be disabled by moving the EScript plugin out of the way.

    But I’m afraid that I’ve never played with Javascript and PDF and the best I can suggest is to have a look at the embedded code in the PDFs related to the URL above and try to hack something.

    AGL

  • Mr. Kahn

    Consider not relying on anything intrinsic to PDF. You don’t know if your document will stay in PDF, it might become a Word file, or RTF, or HTML or be printed out, put on microfilm…etc, etc.

    Clearly include the date of the revision, revision number and a URL to find the latest version at. If pages maybe take out of order, consider even including this information on every page header or footer, in a well designed fashion.

  • http://orcmid.com/blog/ orcmid

    Well, it’s actually not terribly difficult depending on how simple you’re willing to keep it.

    Let’s forget about JavaScript for the moment and consider all that one need do is carry a clickable web link in the document (which almost any current PDF viewer should support, yes?).

    A SOLUTION SKETCH

    Caveat number 1. Anything we do here that is super-automatic is going to put us on the edge of all that cool stuff that got Microsoft Word Macros in so much trouble as a security exploit exposure. So we’ll be thoughtful here and keep the user in control.

    Step #1: The source of the PDF includes a check-version button or a simple clickable web link, maybe at the bottom of the title page, or somewhere else in the front matter.

    The web link is to a URL that reaches a site that is an authoritative source on the status of the document. For good measure, let’s assume the site uses TLS and HTTPS and the usual confirmation of the site’s certificate occurs. The URL references a web page in such a way that (1) the document is identified and (2) the version is identified. This is all designed into the URL.

    Step #2, the user clicks the link, wanting to check for a later version.

    Step #3, the web site receives the URL and finds the page that corresponds to information about the version of document the link was embedded in. If the latest version is making the request, the server sends back a tiny little window (standard HTTP) that says the version of the document is the current one. It might have other information (links to an errata, some advertising or tips maybe), but the idea is that the response be small and clean. The response will cause a browser window to open, though.

    If there is a newer version for the document, a different page is served up. This announces the newest version and indicates what needs to be done to obtain that one. The web server can’t perform any action that “replaces” the old one because we didn’t tell it where the current one is and web security arrangements work against things like that anyhow. But the user can download it and say where to stash it. The user might not want to overlay the old copy, anyhow (and it is still open and in use in this scenario).

    IMPROVEMENTS

    Now, this could be streamlined using Javascript. In particular, the request to the server could be done in a nearly-silent way that doesn’t involve opening up a browser. In the case of no new version, the Javascript provides a “this version is current” response based on a coded server message that is returned (and there’s an existing HTTP response that could possibly be appropriated for this purpose). If there is a new version, it could pass the URL received in the server message (probably what’s called a redirect) to the user’s browser and let it take over.

    STAYING SIMPLE FIRST

    The nice thing about the lean-and-mean version is that it is easy to tryout, there are only normal web pages out there, there’s nothing special in the PDF, and we could see what kind of bumps there are. I love traveling light in working out a concept [;<).

  • Joe Buck

    Wouldn’t that be great? The PDF could document that Oceana has always been at war with Eastasia, and no one could show otherwise.

  • http://ansuz.sooke.bc.ca/ Matthew Skala

    If there were a way to do that (and there probably is) it would be morally questionable to use it. Documents that phone home equals teh bad.

  • http://www.livejournal.com/users/cgranade/ Chris Granade

    In response to Joe Buck:

    Wouldn’t that be great? The PDF could document that Oceana has always been at war with Eastasia, and no one could show otherwise.

    No, it could be documented easily. All you have to do is say “no, don’t update the document,” and it wouldn’t be updated. Plus, with a strong cryptographic signature, it could be easily shown…

    Aside from that, I think that creating a PDF-like format controled by a group like the W3C is essential, and that features like that should be included. While I don’t hate Adobe like I do MS, I don’t like entrusting what has essentially become a web standard to a corporation with no direct interest in openess. That is, that Adobe is only making PDF as open as it is not out of altruism but to integrate the PDF (I’d call it the PDF format, but remember what the F in PDF stands for?) into the flow of information. This strategy has been largely successful, showing that we do need something like PDF. Thus, I would love to see a truly open portable format developed.

  • http://www.noded.com/noded jr

    Simple but effective way. Put version number plus link to wepsite in the document. I’m guessing that most pdf readers know how to hyperlink.

    “Version Number 6a. See: http://www.updatedversions.com for latest version.”

    The underlying link would pass the document and version number. The server could then be able to serve a page with the latest version.

    The assumtion that it all has to be automatically done for the user is way over blown. Give users some credit.

  • http://lemi4.blogdrive.com Lemi4

    …showing that we do need something like PDF.

    Perhaps something like DocBook?

  • http://orcmid.com/blog/ orcmid

    Yes, the same solution works in any document format and viewer where a hyperlink is clickable. Even in documents that aren’t clickable, the ability to copy and paste the URL into an address bar will get you the same process (in the non-Javascript case) and advice about a later version. I say do that much. That sets a baseline case. Sexier stuff can come later. (And this level has almost nothing to do with favoring a proprietary or undocumented format.)

  • Serge Wroclawski

    I agree with most of the others, but let me spin it a little.

    First, you may want to put something on the document (maybe in the summary) saying “Get the most recent version of this here” and put some URL there. PDFs do support hyperlinks (though you should spell it out anyway).

    Then always have the current version there.

    Regarding “version numbers” in documents, I suggest a simple date will do, unless you’re making major revisions.

    But there’s no way to enforce that everyone get the newest copy of a document just as there’s no way to enforce that with paper.

  • http://d.hatena.ne.jp/seiseiy/ Seisei Yamaguchi

    Ex. : Firefox , Plugin update checker .

    ps.
    Mac OS X , PDF native rendering .

  • three blind mice

    While I don’t hate Adobe like I do MS, I don’t like entrusting what has essentially become a web standard to a corporation with no direct interest in openess.

    very charitable chris grande. the issue here is not “openess” but accuracy and the professor’s question is a good one.

    the desire for “openess” – and the opposition to any reasonable content control – has created a web which is largely a source of useless, unreliable, misleading, and inaccurate information – but lots of pornography.

    that is the reality of the “commons.”

    why do you believe that an organization like the W3C will be ANY BETTER than a private for-profit corporation towards improving this dreadful status quo? at least a corporation has a bottom line to encourage them to be accurate.

    the good intentions of non-government organizations like the WCCCP are subject to the corrosive influence of politics and the personal agendas of its directors.

    we will take greedy profit-motivated companies over political groups any day. at least we will still have the democratic choice of how to spend our money.

    by denying us the opportunity to have a web where end-to-end control and commercial interests can co-exist, all you are doing is imposing a (completely discredited) leftist political agenda on us.

  • http://study.nomolog.nagoya-u.ac.jp Frank Bennett

    Linking back to an “official” archive is surely the way to do it. Versioning is not necessarily linear — there may be branches, or the document may be broken up, with chunks finding their way into other projects. A CMS is the obvious environment to set this up (see, e.g., CMFEditions for Plone ). The link would take you back to the doc’s location in the CMS, and in that context it could know about its own currency. The doc in its original form will probably be some form of XML, rather than (shudder) PDF.

  • lessig

    Thanks for the help. The question was raised by some good souls who are encouraging people to release drafts of their academic articles on the web. The one concern many academics have is that the earlier (and hence bad) version remains the version posted everywhere, and referenced everywhere. The increasing standard for publishing such papers is PDF, and it is odd to me that Acrobat is keen to do a bunch of much more complicated control techniques, but not this simple (and it would seem, obvious) function. I too am not anti-Adobe — their business model has been neutrality, and it has facilitated multi-platform growth. And I’m encouraged by 3bm’s kind words about e2e.

  • http://www.zoltai.com John Zoltai

    Coptech’s SmartPDF was specifically designed to handle this problem, but a cruise across the web shows it’s not compatible with Acro 6 and above, and there appear to be no updates available. Perhaps this has something to do with Adobe’s release of Version Cue as a version management system for workgroups. Besides that, it was a hosted service with significant fees.

    Either way, it still leaves the problem hanging out there, and it’s not all that hard to fix, provided you’re willing to stay within the Acro Reader model. The problem is, you need a place to act as the host.

    The model: Set a document ID, current version number, and download url on the host. A javascript in the document goes to the host and queries based on the document id, which is stored in the document metadata. If the version number is different, the user is warned and provided with the download url.

    This is a tiny amount of data, so any host would be able to handle gazillions of requests. The trick: Don’t store the documents on the host, but provide a link to wherever the documents actually reside. This makes for a very scalable model and eliminates the concerns an author would have about being able to control access to the document. Each author would post their documents to servers under their control. The only additional requirement would be to educate the author on creating the appropriate metadata and JavaScript in their document.

    So, the first thing that’s needed, especially for academics, is someone to sponsor and build a host. Sounds like a nice little open source project for a graduate CS department. I’d be willing to do the documentation (req’ts, design, test, etc.) gratis if someone is willing to take on the coding and hosting part.

    -Z-

  • http://www.mac-kenzie.net/blog/ Matt MacKenzie

    Adobe offers a product called “Adobe LiveCycle Policy Server” which allows document owners to revoke/replace a version of a document after it has been distributed. Worth a look…

    http://www.adobe.com/products/server/policy/main.html

  • http://www.youtube.com/watch?v=QJlaBk7TJkg how to tie trench coat belt

    10) Pin one end of the tie strap under the flap
    inside the bag and then repeat on the other side. The only trick is
    to make the listings as relevant as possible. The necktie, as we know it today,
    was born during the Industrial Revolution-when huge revolutions and growths were made in the
    manufacturing industry, especially in the textile and garment segment.