RUSQ Rotating Header Image

What We Talk About When We Talk About Repositories

Repositories, Tools, and the Lifecycle of Digital Information

Both DSpace and Fedora, two major open-source repository tools developed over the past decade, attempt to cover many of the needs for effective data management and access in a repository service. Though these are sometimes referred to as repositories in themselves, other platforms have been deployed that are based on commercial, community-based, and homegrown applications. Each of these has limitations and requires their own tradeoffs of convenience and functionality for users and for system managers.

DSpace, first released in 2001 through a development partnership between Hewlett Packard and MIT, has been promoted since its inception as an application meant to develop an institutional collection of research. The earliest instance of the dspace.org website found in the Wayback Machine, dated April 28, 2001, greets the reader: “Welcome to DSpace, a newly developed digital archive created to capture and distribute the intellectual output of MIT.”9 DSpace provides a set of integrated tools, services, and functions designed to make repository start-up simple, which has led to broad adoption: over the last decade DSpace has been put into production at more than five hundred institutions around the world.

Fedora is an acronym for Flexible Extensible Digital Object Repository Architecture, which sums up the projects philosophy of enabling maximum flexibility and adaptability in the design, implementation, and use of its software. Fedora was initially conceived by researchers at Cornell Universitys Computer Science Department and later developed in partnership with the University of Virginia Library and with funding from the Andrew W. Mellon Foundation. From the start, Fedora has been marketed as the foundation for a wider variety of digital collection management needs, not as an integrated IR solution.

Both DSpace and Fedora recognize the needs of diverse disciplines and researchers. DSpaces interfaces define collections relevant to communities with different needs and expectations for distributing digital content online. Administrators have the ability to enable variable controls on input and access and support different formats, genres, and metadata structures to describe and document those materials. Fedoras flexibility allows each instance to serve unique purposes designed for the case at hand; no two installations look alike or serve the same purpose.10 Until recently, these two tools have developed and been managed independently. However, the DSpace Foundation and Fedora Commons announced on May 12, 2009, that they would merge and form a new organization called DuraSpace. DuraSpace will continue to support and develop DSpace and Fedora and also develop new services to work with both platforms. More details about this change can be found at http://duraspace.org/pressrelease.html.

These are not the only two repository tools in use. Some libraries offer IR services using commercially developed software. The University of Utah has deployed an IR service using ContentDM, a product offered by OCLC and originally created to help organizations manage digital library collections of images and other reformatted materials.11 Digital Commons, a product of BePress, is a hosted solution for IR programs, providing libraries with an opportunity to offer programs with limited technology investment. The California Digital Library was an early adopter of Digital Commons and uses it for their eScholarship platform for all of the University of California campuses.12 Organizations with unique missions and more resources may develop comprehensive archival systems through a variety of applications and technologies, many of them specifically designed and tailored for the particular mission. The National Archives and Records Administration is now developing the Electronic Records Archive, a comprehensive electronic archives management system to handle and preserve the electronic records of the Federal Government.13 The HathiTrust, a shared repository service, is developing its own infrastructure for managing the collective digital content of more than twenty different libraries, projected to include five hundred thousand newly digitized volumes each month.14

These repository tools and implementations of them are, or should be, part of larger systems and strategies for building and caring for collections. Some models of archival and scholarly practices can help to elucidate this. The Open Archival Information Systems (OAIS) Reference Model, an ISO standard, provides a high-level conceptual overview of the organizational and technological functions necessary for the effective archival management of digital data. The OAIS model identifies four core activities in a repository system: ingest (methods to define, describe, document, and authorize the transfer of digital files); data management (the capture, storage, and analysis of metadata); archival storage (infrastructure to protect the integrity of the files at the byte level); access (provided to the user through queries, retrieval, and viewing, or to other applications or archival systems). Governing these activities are preservation planning (to develop strategies to mitigate risk and monitor technological change) and administration (such as the policy decisions that define the goals of the archive or collections and provide the support through financial and other resources).15 The OAIS framework describes not software or tools, but principles of practice that should support their use.

Another model more explicitly addresses our interactions with the data we collect and manage. The Digital Curation Centre at the University of Edinburgh has developed a Curation Lifecycle Model, which vividly captures what we do with information resources in libraries and in our colleges and universities. The lifecycle is a continuous flow of activity where information, represented in digital objects, is selected for acquisition, made accessible, discovered, used, transformed, reacquired and distributed, discovered anew, continuously appraised, and sometimes disposed of.16 This model reminds me of the “rip-mix-burn” credo associated with the Free Culture movement, which can also be crudely applied to the processes of research and scholarship: a remixing of information to create new knowledge. Our faculty and students inquire, discover, and sort information resources then analyze and synthesize them into new work, which is written, published, and distributed for the next scholars. Curation is an active process, one in which our users can and should participate. For a simple example, consider that Fedora has been designed with the assumption that digital objects have multifaceted and overlapping relationships between themselves, and that identifying and making these relationships explicit is a part of scholarship and archival work. A digital object may thus belong to many different networks of content, rather than one parent grouping, and would be accessible through all of those organizing contexts in the repository. A library might deposit a set of images as a defined collection (perhaps all coming from a single source or supplier). Users may wish to re-present those same images as constituent parts of many other collections or sets (such as “images of Italy” or “images for the Western Art Survey”) and to ensure that those relationships and representations are defined, recorded, persistent, and discoverable for other users.

Pages: 1 2 3 4 5

3 Comments

  1. Nicely done! I was very glad to read this, and will be assigning it to the collection-development class I am teaching next semester.

  2. [...] What We Talk About When We Talk About Repositories (source: RUSQ, vol. 49, n 1, nov. [...]

  3. (1) There are 64 EPrints IRs in the US, and 355 worldwide: http://bit.ly/4CokNZ

    (2) For a critique of the 2002 paper by Raym Crow see:

    Self-Archiving, Self-Vetting, “Overlay Journals” and “Disaggregated Models”: Comments on the SPARC Position Paper on Institutional Repositories
    http://openaccess.eprints.org/index.php?/archives/671-guid.html

    (3) For a critique of Cliff Lynch on OA repositories see:
    http://openaccess.eprints.org/index.php?/archives/195-guid.html

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>