creightonbarrett
.com
 

Institutional Repository Bibliography, Version 4

Posted on June 16, 2011 by Creighton

Really appreciate this online bibliography of articles about institutional repositories by Charles W. Bailey, Jr. at Digital Scholarship.  It’s very thorough and it contains links to sources whenever possible.

Here is the table of contents:

1 General (Last update: 6/15/11)
2 Country and Regional Surveys (Last update: 6/15/11)
3 Multiple-Institution Repositories (Last update: 11/15/10)
4 Specific Institutional Repositories (Last update: 6/15/11)
4.1 eScholarship

4.2 MIT

4.3 OSU Knowledge Bank

4.4 Other

5 Digital Preservation (Last update: 6/15/11)
6 Library Issues (Last update: 6/15/11)
7 Metadata (Last update: 6/15/11)
8 Institutional Open Access Mandates and Policies (Last update: 6/15/11)
9 R&D Projects (Last update: 6/15/11)
10 Research Studies (Last update: 6/15/11)
11 Software (Last update: 6/15/11)
11.1 General

11.2 DSpace

11.3 EPrints

11.4 Fedora

11.5 Other

12 Electronic Theses and Dissertations (Last update: 6/15/11)
Appendix A. Related Bibliographies (Last update: 6/15/11)
Appendix B. About the Author (Last update: 6/15/11)

Thanks Charles!

 


ACA and the Toronto Skyline

Posted on June 13, 2011 by Creighton

So, I just finished a busy few months that involved teaching my first class (a course on archives at the Dalhousie School of Information Management) and presenting a paper at the Association of Canadian Archivists conference in Toronto. Needless to say, I didn’t have much time to write here, but I’m hoping that changes.

I’ll try to write more about the conference soon, but for now, I just thought I’d share a panoramic photo of the Toronto skyline as seen from one of the balconies in the Delta Chelsea.

Toronto Skyline on June 4, 2011

Toronto Skyline on June 4, 2011

Was a pretty nice night out there!

 


Open Folklore receives the 2011 Outstanding Collaboration Citation from Association of Library Collections and Technical Services

It’s been a while since I posted.  I’ve really been meaning to put up Part Two of the Future of Digital Preservation post , but I’ve been busy with so many other things.  I am also working on a post on a recent discussion about ethnographic archives on the Society for Ethnomusicology email list and one about some recent projects at work, but for now, I just wanted to share the news that the  Open Folklore project is the recipient of the Association for Library Collections and Technical Services‘ 2011 Outstanding Collaboration Citation.

The Outstanding Collaboration Citation:

recognizes and encourages collaborative problem-solving efforts in the areas of acquisition, access, management, preservation, or archiving of library materials. It recognizes a demonstrated benefit from actions, services, or products that improve and benefit providing and managing library collections.

You can read the Indiana University press release about the award here.  Congratulations all around, the award is much deserved!


The Future of Digital Preservation, Part One: Public-Private Partnerships

Last April, Steve Bailey was asked to present a paper at the  8th European Conference on Digital Archiving in which he addressed the question “in whose hands does the future of digital preservation lie?”   Steve’s short answer, was basically, in Google’s hands.  The answer was both literal – given Google’s significant presence in “the cloud”  – and metaphorical, as an “encapsulation of all cloud service providers.”

Marieke Guy wrote a nice summary of Steve’s talk in which she notes:

“Bailey points out that we now find ourselves in a world where the responsibility for archiving much of our office 2.0 documents lays at the feet of 3rd parties. Documents are stored according to format and regardless of their communality of content, text documents are now stored on Google docs, videos on YouTube, photos on Flickr and so on. Although cloud services have brought us much flexibility they have left us with a Pandora’s Box, ‘no regard for preservation’ is one of the evils that has flown out. They are externally hosted services with very different agendas from ours, they may notify us if they are going to delete all our content but they don’t necessarily have to so.”

This is a challenging situation for archivists.   Steve is not the first to question the commitment third party service providers have toward digital preservation, nor is he the first to point out the “decline in our future professional role” third party digital storage is likely to bring about.  But rather than lament the situation or look inward for answers, Steve suggests we open a dialog with Google and other cloud companies:

“Perhaps we should actually stop to ask Google and their peers whether they are indeed aware of the fact that the future of digital preservation lies in their hands and the responsibilities which comes with it and whether this is a role they are happy to fulfill. For perhaps just as we are in danger of sleepwalking our way into a situation where we have let this responsibility slip through our fingers, so they might be equally guilty of unwittingly finding it has landed in theirs.

If so, might this provide the opportunity for dialogue between the archival professions and cloud based service providers and in doing so, the opportunity for us to influence (and perhaps even still directly manage) the preservation of digital archives long into the future”.

Starting a dialogue between archivists and cloud service providers is excellent advice.  But I question whether we have actually found ourselves in a situation where the future of digital preservation lies in the hands of these businesses, or even in the hands of the technology “the cloud” represents.  Cloud technology is already playing a role in digital preservation, but we are far from seeing private cloud service providers assume responsibility for preserving electronic records in the public interest.

I think, rather, that we are in a far more precarious situation where the future of digital preservation lies in the hands of a group of poorly understood concepts — public-private partnerships, open source software, and effective, non-proprietary standards.  And “the cloud” will continue to provide us with a false sense of security until archivists assert themselves in the digital world.

The Elusive Cloud

You hear about it everywhere these days.  The cloud market is predicted to explode in the next few years. Widespread adoption of cloud storage is imminent.  Google is experimenting with a new cloud storage service.  But what does it all mean?

For one thing, it’s not what you see in Microsoft’s new “To the Cloud” ad campaign.   The couple in the airport using some kind of remote desktop tool is not cloud computing.  Storing your documents using Office 365 or Google Docs is cloud computing.  But really, it all depends on who you ask.  Philip Delves Broughton offered a reasonable explanation in a recent Financial Times article:

The idea is that we can now use computer services as if they were a utility, like electricity, drawing on software and hardware when we need them, rather than each of us owning our own generators and distribution networks.

For digital natives, the cloud is as natural to computing as the keyboard. The cloud is Facebook, Zynga and Gmail. To an older generation, the cloud is WikiLeaks and data breaches. For managers trying to weigh up whether this is a fad or here to stay, where you stand may just be a function of your age.

Cloud technology clearly has a lot to offer businesses, organizations, and individuals.  But it also carries a significant amount of risk.  Companies like Google, Facebook, and even web hosts assume no responsibility they don’t absolutely need for their business to operate.   The internet is inherently spontaneous and content routinely appears, disappears, changes domains, etc.   The cloud just defies traditional concepts of what a record is, how it is created, and where it should be stored.

155258-Amazon_Web_Services_thumb_original

If you're looking for a host for your potentially controversial content, don't trust Amazon with it

Basically, the big risk with relinquishing control over your records is that you have no control over your records.  They are left in the hands of whatever business owns the servers they reside on, and more importantly, whatever dynamics affect the way that business operates.

Take Wikileaks.  After getting hacked, Wikileaks moved its website to Amazon Web Services, only to be pressured to remove the content.  When the company obliged, PC World asked “in an idyllic future where we make heavy use of the cloud, what happens if a cloud service provider removes content it deems inappropriate, or just doesn’t like?”

Needless to say, businesses are still wary of using the cloud for their critical data.  But all the forecasts about cloud technology remain true – it is playing an increasingly central role in how we work.  For many businesses, this will mean a hybrid approach that uses cloud technology for some content and traditional networks for other content.  This all means archivists must understand the intricacies of cloud technology if they want to be leaders in digital preservation.  (Since we don’t, Steve Bailey is absolutely correct that cloud service providers have digital preservation in their hands, for now).

Public Interest in the Private Sector

Steve began his paper with a reminder of what a collection of personal papers might look like if it was held by the various businesses someone interacted with.  He used the collection of Samuel Pepys, an 18th century diarist.  Would Pepys’ collection still exist if he entrusted it to the various businesses he bought communication media from (the tannery, the stationer, the cartographer)?  In chunks?  Perhaps, but in its entirety? doubtful.

Taking the metaphor “to the cloud,” what would a collection look like if an archivist tried to compile it from the various service providers someone interacted with?  Would their entire collection exist in 50 years?  100 years?

In these terms, it becomes clear that leaving any responsibility for digital preservation in the hands of the private sector would place those records at great risk.   For one thing, we have to assume most of these private companies don’t want to have that responsibility.   We are far more likely to agree to a terms of service with a waiver of liability than one with a clause about long-term preservation.  That’s usually because well-managed businesses are in general wary of taking on long-term obligations unless it brings profit or is necessary to operate.  Obligations carry risks, and risks increase liability.  Why would a business providing cloud services agree to be responsible for the preservation of content unless that’s what they were selling?

Suppose cloud companies like Google did want to assume this responsibility.  What would this look like?  Would it be unilateral agreements between a business and an archival institution?  Would archival institutions have to negotiate these agreements each time they acquire a collection that involves a new third party?  Or would businesses just publicly affirm their commitment to beefing up their digital preservation practices?  What would they do beyond simply storing records on their servers?  Would they seek advice from archivists and conservators or computer scientists and developers, or (hopefully) both?

I can’t think of any desirable situation where third party businesses would be left responsible for the long-term storage of electronic records, let alone the preservation of them.  I know there are many archives that have agreements with records storage companies, but that’s just a different situation than being comfortable with considering someone’s home videos preserved if the only copy is available on YouTube.  As democratic or open as the internet may be, the legal framework surrounding it – and electronic media in general – is way too fragile for archivists to just accept that private companies will preserve the records they are entrusted with unless preservation is explicitly accounted for.

Public-Private Partnerships in Heritage Institutions

This is not to say that private businesses don’t share a role in archiving.  The private sector has a significant role – from the production of communication and storage media all the way to archival consulting and conservation.  As the public purse tightens, it’s inevitable that the heritage community will turn to the private sector for support for core functions like preservation.

And there will be demonstrable results.  Google Books and Ancestry.com are just two examples of agreements between public institutions and private businesses that have yielded great dividends for the public, even if they have threatened the role of archivists and librarians in the organization and provision of information.  The Library of Congress’s decision to archive Twitter is another.

Archivists should really consider this type of agreement to be a public-private partnership.  The business is either contributing content, contributing storage and/or related services, or both.  There is a degree of risk assumed by all parties, and a public service is being provided.  These partnerships are controversial, but it’s clear that key decision makers in the heritage community are being swayed in their favour, because they are increasingly common.

I’m not advocating for public-private partnerships and I recognize that there is much to criticize about them.  But one way or another, contractual agreements between private businesses that operate in the cloud and archival institutions with an interest in acquiring content from the cloud will be an important component to the future of digital preservation, even for personal archives.

clockss_header

CLOCKSS is a non-profit joint venture that brings together scholarly publishers and research libraries

The growing presence of these relationships is already leading to new organizational models, which will invariably lead to new tools and procedures.  And as you might expect from the introduction of private interests, restricted access provisions are even being used as a carrot for long-term preservation.

One example of this is CLOCKSS, a non-profit joint venture between scholarly publishers and research libraries that makes use of LOCKSS technology. The program was set up “to ensure the long-term survival of Web-based scholarly publications for the benefit of the greater global research community.”  Content in CLOCKSS is only accessible when it is no longer available from the publisher, like when the publisher goes out of business or when the title is no longer available.  CLOCKSS calls these “trigger events,” and when the CLOCKSS Board detects them, it initiates a process that migrates the content to the newest format and transfer it from a secure server to a publicly accessible server.

In his discussion of the cloud’s role in digital preservation, Steve Bailey proposed the creation of a similarly structured public funded meta repository for online content:

“Maybe the interconnection of content creation and use and its long term preservation need not be as indivisible within the cloud as it might first appear. Yes Google’s appetite for content might appear insatiable, but that does not necessarily mean that they wish to hold it all themselves – after all, their core business of search does not require them to hold themselves every web page they index, merely to have the means to crawl it and to return the results to the user. Might we be able to persuade them that the same logic should also apply to the contents of Google Apps, Blogger, YouTube and the like? If so, might the door be open for us, the archival community through the publicly funded purse to create and maintain our own meta-repository within which online content can be transferred, or just copied, for controlled, managed long term storage whilst continuing to provide access to it to the services and companies from which it originated?

That way they get to continue to accrue the benefit of allowing their users to access and manipulate digital content in ways which benefit their bottom line, the user continues to enjoy the services they have grown accustomed to and the archival community can sleep soundly, safe in the knowledge that whilst service providers are free to do what they want with live content, its long term preservation and safety continues to lie in our own experienced and trusted hands”.

It really is an excellent idea.  It’s likely that partnerships and consortia like CLOCKSS will continue to shape the digital preservation landscape for the foreseeable future.  So why not create a global network of repositories to transfer or copy electronic content to?  If businesses can have a cloud, why can’t archives?  Why can’t there be a heritage cloud, mirrored in multiple locations to ensure its longevity?

Without a strong commitment from government, heritage organizations are left with a choice between failing to deliver on their responsibilities or delivering on their responsibilities with the assistance of the private sector.  This means more public-private partnerships.  If Google Books and Ancestry.com are any indication, these agreements have a better chance of success in the heritage sector than they might in built infrastructure or education. Who knows, maybe Google will step up and incorporate digital preservation into its philanthropic activities.  But archival institutions should tread carefully and stick to their core principles when forging these relationships.  And make sure that they are keeping up with the technology every step of the way.

(In Part Two of this post, I’ll finish with some thoughts on how the future of digital preservation also lies in open source software and standards)


Piedmont Folk Legacies Seeks to Build Banjo Knowledge Management System

If you’re a fan of the banjo, you may want to take a look at a project being carried out by Piedmont Folk Legacies.  In 2009, the non-profit organization based in Eden, North Carolina received a Level I Digital Humanities Start-Up Grant from the National Endowment for the Humanities.  The grant was for “planning activities for the creation of a proof-of-concept knowledge management system to allow researchers to study the development and performance history of musical instruments, using the banjo as a test case.”  The project is called the “Banjo Sightings Database Project: Vernacular Music Material Culture in Space and Time” and Greg Adams is Project Director.

This sounds really interesting.  I’m intrigued about what it could do for banjo researchers and for what kind of model it will be for other knowledge management activities in social sciences and the humanities.

The project description offers this abstract:

Few musical instruments are more closely tied or hold greater significance to American history than the banjo. From its West African roots, to its birth in the seventeenth century Caribbean, and through its meteoric rise in nineteenth century American popular culture, the banjo is an iconic instrument whose impact is woven into the cultural fabric of the American experience. As scholars, researchers, and enthusiasts continue to discover new information about the early banjo, there is no collective location to maintain, interact with, and collectively analyze this important data. The proposed Banjo Sightings Database Project (BSD) will combine rare and widely-dispersed primary source material (circa 1650-1870) with appropriate and innovative technological applications, resulting in a system that not only catalogs information about the early banjo, but also establishes an interactive, peer-reviewed knowledge management system, allowing users to explore the early banjo.

I can see this being a really useful research and discovery tool.  Greg is soliciting two kinds of feedback right now: (1) beyond the banjo, how might the broader implications of this project relate to your work or the work of colleagues? (2) questions or comments regarding the actual project or the white paper.

Collaboration

At the Association of Canadian Archivists conference last June, I presented a paper on the Helen Creighton collection, and one of the things I touched on was what a global registry of traditional music “instances” might look like.  In many ways, what I was discussing is what Greg is building for the banjo.  I would like to see a system that can hold information about traditional music, and I want it to compile information about a particular song – archival materials, museum materials, published sources, gray literature, recordings, etc.   He uses the term “sightings,” I was using the term “instances,” but the principle is basically the same.

So a lot of what I have to say has to do with the fact that what Piedmont Folk Legacies is building for the banjo is very similar to what I am experimenting with for traditional music, specifically the Helen Creighton collection.

I was pleased to see that in Section III of the report, Greg identified the need to establish collaborative partnerships as an area that required immediate attention.  Collaborative partnerships are increasingly becoming a critical component of projects like this.   For the Sightings Database to be successful, the project will require the sustained participation of key institutions and organizations.  It’s good to see that the project has identified the need to form these partnerships at an early stage, because when it comes to unifying information about collections held in different heritage institutions, Piedmont Folk Legacies is wading into uncharted waters.

The paper I presented on Helen Creighton was part of a panel on archival collaboration.  It was fitting because the collection is distributed across four different archives.  The lack of collaboration has resulted in a difficult situation for researchers and uncoordinated  preservation activities by the institutions that hold the collections.  I have found it especially difficult to compile information about the collections because the institutions do not have any kind of collaborative partnership to jointly manage the collection as a whole.  And that is just for one collection.   Even if the project does focus on the banjo as “test case,” it will only be a viable discovery tool if it is able to pull information about archival and museum holdings from many institutions.  This in itself would be a major feat.  If Piedmont Folk Legacies is able to create a viable prototype, it will also have succeeded in creating a collaborative model for heritage institutions.

The “Bigger Picture”

It’s not surprising to hear that many people who participated in the planning period of the project wanted to see the next phase focus on more than just the banjo.  The report notes that:

While all respondents of the planning period outreach exercise found value in focusing on the banjo as a “test case,” most generally preferred to see that the Project focus on the “bigger picture.”  As Project Director, the most pressing issue for Adams is to maintain a practical balance between these different communities of interest. First, he must answer to the knowledge-bearers and other stakeholders within the banjo community who desire that the “test case,” the Banjo Sightings Database Project, be fully realized. On the other hand, as the outreach efforts have clearly shown, listserv respondents and representatives of institutions within the DC-metro region, who represent broader communities and possess much greater infrastructure, desire solutions to knowledge management as part of the “bigger picture” and not necessarily based on the banjo as the “test case.”

The report seems to send contradictory information about what the advisory board actually said.  Section II notes that the advisory board “regularly revisited the importance of thinking about the Project’s broader applicability to music instruments in general and largely agreed that the banjo was an excellent test case because of its multidisciplinary implications.”  But Section III says most generally preferred to see the project focus on the “bigger picture.”   This is confusing.  Did the advisory board want the project to focus on the bigger picture or did it like the idea of using the banjo as a test case?

Adams appears to have concluded that he will focus on the former group, because he concludes the report by saying that “ultimately, this “test case” will serve as a model for how researchers collaboratively study the development, migration, transformation, and dissemination of any music instrument.”

charlie-pooleI question the wisdom behind this decision. The advisory board had representatives for the “knowledge-bearers and other stakeholders within the banjo community,” so I’m not sure there are two communities of interest.  And while I completely understand the rational behind Peidmont Folk Legacies’ interest in a banjo database (the non-profit is best known for organizing a music festival for Charlie Poole, an old-time banjo player from Eden, North Carolina), I think the advice to focus on knowledge management solutions for music instruments in general is good.

Focusing on a system that can handle all instruments obviously increases the scope and variability of the information it needs to handle.  The schema for the database would need to be overhauled.  But this would ensure that the system is robust enough to suit the needs of all researchers.

There is also the issue of backwards compatibility.  A knowledge management system built for the banjo may not necessarily work as a model for other instruments, but a knowledge management system built for instruments would be capable of organizing information about the banjo.

Recommendations

Either way, I look forward to seeing where the project goes.  I have a few recommendations about how to proceed:

  1. Abandon “Sighting” terminology. Referring to a text-based reference to a banjo as a “sighting” is a little confusing.  My experience is that researchers prefer terms that are familiar and unambiguous.
    I’ve been using “instances,” but I’m not sure there even needs to be a term to unify everything.  It would probably be fine to refer to a banjo as a banjo and a recording as a recording.
  2. Incorporate descriptive and structural metadata standards. The current schema is very good, but I think the final product would benefit greatly from incorporating established standards.  This could be achieved by broadly envisioning the database as several interconnected sets of data described using appropriate content standards and encoded using appropriate description standards.  These might include:
    1. Descriptive Standards
    2. Structural Standards

    Standards will help ensure that whatever is built is scalable and capable of interaction with other databases and information systems.  This will be especially important if the database hopes to harvest information from other standards-compliant information systems.

  3. Add references to banjo recordings. The one type of “sighting” I felt was missing was the auditory kind.  It would be nice to include references to recordings of banjo music.
  4. Crowdsource. This would be a perfect project to investigate the feasibility of crowdsourcing description.  Or even harvesting data.  Especially if the database is supposed to inform the “bigger picture,” it would be great to incorporate some more efficient ways of populating the database.
  5. Focus on the “Bigger Picture.” Again, I understand the rational behind narrowing the focus, but I do think the final prototype will be more useful if it is capable of handling more instruments.   It would open up the database to many more researchers and reframe what the “bigger picture” actually is.

If you’re interested in seeing where Piedmont Folk Legacies takes this project, Greg Adams has a Vernacular Music Material Culture blog where he will be posting updates about what happens.  It looks like Piedmont Folk Legacies plans on pursuing Level II funding.  Hopefully the next phase of the project is able to build on the current prototype database and address some of the broader knowledge management needs facing the heritage community.