Last April, Steve Bailey was asked to present a paper at the 8th European Conference on Digital Archiving in which he addressed the question “in whose hands does the future of digital preservation lie?” Steve’s short answer, was basically, in Google’s hands. The answer was both literal – given Google’s significant presence in “the cloud” – and metaphorical, as an “encapsulation of all cloud service providers.”
Marieke Guy wrote a nice summary of Steve’s talk in which she notes:
“Bailey points out that we now find ourselves in a world where the responsibility for archiving much of our office 2.0 documents lays at the feet of 3rd parties. Documents are stored according to format and regardless of their communality of content, text documents are now stored on Google docs, videos on YouTube, photos on Flickr and so on. Although cloud services have brought us much flexibility they have left us with a Pandora’s Box, ‘no regard for preservation’ is one of the evils that has flown out. They are externally hosted services with very different agendas from ours, they may notify us if they are going to delete all our content but they don’t necessarily have to so.”
This is a challenging situation for archivists. Steve is not the first to question the commitment third party service providers have toward digital preservation, nor is he the first to point out the “decline in our future professional role” third party digital storage is likely to bring about. But rather than lament the situation or look inward for answers, Steve suggests we open a dialog with Google and other cloud companies:
“Perhaps we should actually stop to ask Google and their peers whether they are indeed aware of the fact that the future of digital preservation lies in their hands and the responsibilities which comes with it and whether this is a role they are happy to fulfill. For perhaps just as we are in danger of sleepwalking our way into a situation where we have let this responsibility slip through our fingers, so they might be equally guilty of unwittingly finding it has landed in theirs.
If so, might this provide the opportunity for dialogue between the archival professions and cloud based service providers and in doing so, the opportunity for us to influence (and perhaps even still directly manage) the preservation of digital archives long into the future”.
Starting a dialogue between archivists and cloud service providers is excellent advice. But I question whether we have actually found ourselves in a situation where the future of digital preservation lies in the hands of these businesses, or even in the hands of the technology “the cloud” represents. Cloud technology is already playing a role in digital preservation, but we are far from seeing private cloud service providers assume responsibility for preserving electronic records in the public interest.
I think, rather, that we are in a far more precarious situation where the future of digital preservation lies in the hands of a group of poorly understood concepts — public-private partnerships, open source software, and effective, non-proprietary standards. And “the cloud” will continue to provide us with a false sense of security until archivists assert themselves in the digital world.
The Elusive Cloud
You hear about it everywhere these days. The cloud market is predicted to explode in the next few years. Widespread adoption of cloud storage is imminent. Google is experimenting with a new cloud storage service. But what does it all mean?
For one thing, it’s not what you see in Microsoft’s new “To the Cloud” ad campaign. The couple in the airport using some kind of remote desktop tool is not cloud computing. Storing your documents using Office 365 or Google Docs is cloud computing. But really, it all depends on who you ask. Philip Delves Broughton offered a reasonable explanation in a recent Financial Times article:
The idea is that we can now use computer services as if they were a utility, like electricity, drawing on software and hardware when we need them, rather than each of us owning our own generators and distribution networks.
For digital natives, the cloud is as natural to computing as the keyboard. The cloud is Facebook, Zynga and Gmail. To an older generation, the cloud is WikiLeaks and data breaches. For managers trying to weigh up whether this is a fad or here to stay, where you stand may just be a function of your age.
Cloud technology clearly has a lot to offer businesses, organizations, and individuals. But it also carries a significant amount of risk. Companies like Google, Facebook, and even web hosts assume no responsibility they don’t absolutely need for their business to operate. The internet is inherently spontaneous and content routinely appears, disappears, changes domains, etc. The cloud just defies traditional concepts of what a record is, how it is created, and where it should be stored.
Basically, the big risk with relinquishing control over your records is that you have no control over your records. They are left in the hands of whatever business owns the servers they reside on, and more importantly, whatever dynamics affect the way that business operates.
Take Wikileaks. After getting hacked, Wikileaks moved its website to Amazon Web Services, only to be pressured to remove the content. When the company obliged, PC World asked “in an idyllic future where we make heavy use of the cloud, what happens if a cloud service provider removes content it deems inappropriate, or just doesn’t like?”
Needless to say, businesses are still wary of using the cloud for their critical data. But all the forecasts about cloud technology remain true – it is playing an increasingly central role in how we work. For many businesses, this will mean a hybrid approach that uses cloud technology for some content and traditional networks for other content. This all means archivists must understand the intricacies of cloud technology if they want to be leaders in digital preservation. (Since we don’t, Steve Bailey is absolutely correct that cloud service providers have digital preservation in their hands, for now).
Public Interest in the Private Sector
Steve began his paper with a reminder of what a collection of personal papers might look like if it was held by the various businesses someone interacted with. He used the collection of Samuel Pepys, an 18th century diarist. Would Pepys’ collection still exist if he entrusted it to the various businesses he bought communication media from (the tannery, the stationer, the cartographer)? In chunks? Perhaps, but in its entirety? doubtful.
Taking the metaphor “to the cloud,” what would a collection look like if an archivist tried to compile it from the various service providers someone interacted with? Would their entire collection exist in 50 years? 100 years?
In these terms, it becomes clear that leaving any responsibility for digital preservation in the hands of the private sector would place those records at great risk. For one thing, we have to assume most of these private companies don’t want to have that responsibility. We are far more likely to agree to a terms of service with a waiver of liability than one with a clause about long-term preservation. That’s usually because well-managed businesses are in general wary of taking on long-term obligations unless it brings profit or is necessary to operate. Obligations carry risks, and risks increase liability. Why would a business providing cloud services agree to be responsible for the preservation of content unless that’s what they were selling?
Suppose cloud companies like Google did want to assume this responsibility. What would this look like? Would it be unilateral agreements between a business and an archival institution? Would archival institutions have to negotiate these agreements each time they acquire a collection that involves a new third party? Or would businesses just publicly affirm their commitment to beefing up their digital preservation practices? What would they do beyond simply storing records on their servers? Would they seek advice from archivists and conservators or computer scientists and developers, or (hopefully) both?
I can’t think of any desirable situation where third party businesses would be left responsible for the long-term storage of electronic records, let alone the preservation of them. I know there are many archives that have agreements with records storage companies, but that’s just a different situation than being comfortable with considering someone’s home videos preserved if the only copy is available on YouTube. As democratic or open as the internet may be, the legal framework surrounding it – and electronic media in general – is way too fragile for archivists to just accept that private companies will preserve the records they are entrusted with unless preservation is explicitly accounted for.
Public-Private Partnerships in Heritage Institutions
This is not to say that private businesses don’t share a role in archiving. The private sector has a significant role – from the production of communication and storage media all the way to archival consulting and conservation. As the public purse tightens, it’s inevitable that the heritage community will turn to the private sector for support for core functions like preservation.
And there will be demonstrable results. Google Books and Ancestry.com are just two examples of agreements between public institutions and private businesses that have yielded great dividends for the public, even if they have threatened the role of archivists and librarians in the organization and provision of information. The Library of Congress’s decision to archive Twitter is another.
Archivists should really consider this type of agreement to be a public-private partnership. The business is either contributing content, contributing storage and/or related services, or both. There is a degree of risk assumed by all parties, and a public service is being provided. These partnerships are controversial, but it’s clear that key decision makers in the heritage community are being swayed in their favour, because they are increasingly common.
I’m not advocating for public-private partnerships and I recognize that there is much to criticize about them. But one way or another, contractual agreements between private businesses that operate in the cloud and archival institutions with an interest in acquiring content from the cloud will be an important component to the future of digital preservation, even for personal archives.
The growing presence of these relationships is already leading to new organizational models, which will invariably lead to new tools and procedures. And as you might expect from the introduction of private interests, restricted access provisions are even being used as a carrot for long-term preservation.
One example of this is CLOCKSS, a non-profit joint venture between scholarly publishers and research libraries that makes use of LOCKSS technology. The program was set up “to ensure the long-term survival of Web-based scholarly publications for the benefit of the greater global research community.” Content in CLOCKSS is only accessible when it is no longer available from the publisher, like when the publisher goes out of business or when the title is no longer available. CLOCKSS calls these “trigger events,” and when the CLOCKSS Board detects them, it initiates a process that migrates the content to the newest format and transfer it from a secure server to a publicly accessible server.
In his discussion of the cloud’s role in digital preservation, Steve Bailey proposed the creation of a similarly structured public funded meta repository for online content:
“Maybe the interconnection of content creation and use and its long term preservation need not be as indivisible within the cloud as it might first appear. Yes Google’s appetite for content might appear insatiable, but that does not necessarily mean that they wish to hold it all themselves – after all, their core business of search does not require them to hold themselves every web page they index, merely to have the means to crawl it and to return the results to the user. Might we be able to persuade them that the same logic should also apply to the contents of Google Apps, Blogger, YouTube and the like? If so, might the door be open for us, the archival community through the publicly funded purse to create and maintain our own meta-repository within which online content can be transferred, or just copied, for controlled, managed long term storage whilst continuing to provide access to it to the services and companies from which it originated?
That way they get to continue to accrue the benefit of allowing their users to access and manipulate digital content in ways which benefit their bottom line, the user continues to enjoy the services they have grown accustomed to and the archival community can sleep soundly, safe in the knowledge that whilst service providers are free to do what they want with live content, its long term preservation and safety continues to lie in our own experienced and trusted hands”.
It really is an excellent idea. It’s likely that partnerships and consortia like CLOCKSS will continue to shape the digital preservation landscape for the foreseeable future. So why not create a global network of repositories to transfer or copy electronic content to? If businesses can have a cloud, why can’t archives? Why can’t there be a heritage cloud, mirrored in multiple locations to ensure its longevity?
Without a strong commitment from government, heritage organizations are left with a choice between failing to deliver on their responsibilities or delivering on their responsibilities with the assistance of the private sector. This means more public-private partnerships. If Google Books and Ancestry.com are any indication, these agreements have a better chance of success in the heritage sector than they might in built infrastructure or education. Who knows, maybe Google will step up and incorporate digital preservation into its philanthropic activities. But archival institutions should tread carefully and stick to their core principles when forging these relationships. And make sure that they are keeping up with the technology every step of the way.
(In Part Two of this post, I’ll finish with some thoughts on how the future of digital preservation also lies in open source software and standards)