Libraries are moving from a business model based primarily on managing the products and the output of research and scholarship to a model based on facilitating the process of scholarship, teaching, and research that result in those products. No matter what technology is used, the lifecycle model for digital data curation suggests that repositories and digital data management are not distinct backroom technology operations but activities that should be functionally integrated into the mission and services of the library. Repository-enabled services will be critical to the future of scholarship in general, regardless of who offers them. Commercial agents, such as Google, can outperform existing library systems on speed and breadth of basic searches, but the preservation and scholarly use of digital assets are still fertile ground for libraries, technologists, and library users.
What Do We Mean By Services?
For many of us, the answer to this question is obvious: Services are the activities we perform to support the researchers, students, teachers, and members of the public who use our libraries. But the term “service” also has a specific use, referring to technical functions conducted through interoperable, machine-to-machine interfaces. For instance, many digital repository collections use the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to automatically share their metadata with other collections with the aim of improving discovery. Such services have little direct human intervention and provide additional functionality and data to applications, including repository systems.
Some IR programs may have developed as services looking for a need, but nothing creates a need for their services like an institutional or legal mandate to use them. Since the spring of 2008, several elite universities or their colleges—including Harvards Faculty of Arts and Science and John F. Kennedy School of Government, Stanfords School of Education, and all of MIT—have adopted policies requiring that all publications of their faculty also be made available in an open-access online service managed by the school. When the National Institutes of Health (NIH) began to require PubMed deposit of all publications based on NIH funding, several universities took advantage of the change to offer enhanced services to their community. For example, the University of Michigan performs NIH deposit on behalf of their own researchers, using their existing IR deposit methods as a way of gathering the articles for submission to PubMed Central.17 Other mandates will prove more complicated to meet, such as archival electronic records management. In this arena, a variety of public policy and regulatory matters have driven technology companies such as Sun, Hewlett-Packard, and EMC to develop new storage technologies that can better ensure the integrity of digital data making up the records of businesses, financial institutions, and government. Higher education institutions must meet similar requirements for auditing and disclosure, and this emerging need will challenge organizations traditionally tasked with maintaining the records and archives. The diversity of formats, record types, and relationships between these records creates a challenging environment in which to establish basic policies and practices to ensure compliance with legal requirements.
Mandates are not a reliable growth model for services. However, aggregating content is our bread-and-butter, and repository systems can enable large-scale aggregation to offer improved access. At the University of Virginia Library, the Fedora repository platform was concurrently developed with an integrated digital collections system that uses Fedora to deliver electronic texts, images, and special collections finding aids. The metaphor here would be the stacks, rather than the archive. Among the texts are books digitized by the Library alongside full-text databases published by ProQuest that are also included in their Literature OnLine (LION) product. The image collections include both purchased and licensed sets from Archivision, as well as images digitized from Virginias Special Collections and contributed by faculty at the school. Bringing these disparate collections into an integrated collection management environment enables searches across the collections and makes it possible to create additional applications that allow users greater functionality. The Collectus tool, for example, provides a way to save sets of images and texts for use and sharing with their classes or colleagues.18
Some institutional repository services and their infrastructure serve as the basis for publication activities. Campus-based publishing has become an increasingly visible (though still very experimental) service at many research libraries and smaller ones as well. They share core assumptions with broader IR programs: Libraries, working with faculty and often with publishers such as university presses, can provide cost-effective technology to support the open distribution of research literature from within the university. Compared with IRs, these programs require an even greater degree of faculty engagement while offering a more specific service focused on distributing complete titles or collections. In her 2008 study for the Association of Research Libraries (ARL), Research Library Publishing Services, Karla Hahn reported that 44 percent of eighty responding ARL libraries offered some form of publishing service for journals, monographs, or conference proceedings.19 Open-source publishing tools such as Open Journal Systems and DPubS are frequently used for these purposes and sometimes are used to provide support for editors and authors in their review and submission processes.20 A large number of institutions reported using DSpace or Digital Commons for their publishing platform.21 Although DSpace does not offer native, out-of-the-box workflow tools for editors of publications, Digital Commons offers users editorial workflow tools designed for BePresss own journals. Given the experimental nature of these efforts, it appears that many institutions are limiting costs by first taking advantage of their existing technology investments before investigating more specialized service offerings.
Publishing, in the limited sense of distribution, can be integrated into the curriculum via repository programs as well. Like many others, Penn State—my own university—now requires that all theses and dissertations be submitted to the Graduate School in electronic format through a system managed by the libraries. One simple model to extend this could include providing an electronic deposit service to undergraduate programs that require a formal thesis or paper for graduation. The Ethnography of the University Initiative (EUI) at the University of Illinois Urbana–Champaign offers another model. Through classes associated with EUI, undergraduates in different fields of study engage in original research about their campus using their familiar home environment to explore the concepts they are learning. EUI provides these students with experience publicly distributing their work by selecting research reports for inclusion in IDEALS, the University Librarys repository service. The EUI collection, numbering more than 350 works, also serves as a research collection for students engaged in the program.22
Less formal, direct-to-reader publishing via media such as blogs has become an increasingly important part of daily discourse and scholarly communication. Blog software is readily available, and some institutions have created a centralized service specifically for their students and faculty. The ephemeral nature of most blogging is reflected in the software: these are first and foremost authoring and distribution tools, and do not provide the all of the capabilities for data management, preservation, and discovery that we expect to see in an archival service. The National Science Digital Library (NSDL), an initiative funded by the National Science Foundation, has launched a blog service known as Expert Voices, which features postings by multiple experts to promote online collaboration on science topics across different communities, such as K–12, researchers, and librarians. NSDL has designed Expert Voices so that it can easily interoperate with other resources in the library and so that the discussions and new content may be directly captured and managed in a repository environment based on Fedora. Rather than sitting off to the side, the data created in the blogs can easily become a part of the managed digital library.23
Nicely done! I was very glad to read this, and will be assigning it to the collection-development class I am teaching next semester.
[...] What We Talk About When We Talk About Repositories (source: RUSQ, vol. 49, n 1, nov. [...]
(1) There are 64 EPrints IRs in the US, and 355 worldwide: http://bit.ly/4CokNZ
(2) For a critique of the 2002 paper by Raym Crow see:
Self-Archiving, Self-Vetting, “Overlay Journals” and “Disaggregated Models”: Comments on the SPARC Position Paper on Institutional Repositories
http://openaccess.eprints.org/index.php?/archives/671-guid.html
(3) For a critique of Cliff Lynch on OA repositories see:
http://openaccess.eprints.org/index.php?/archives/195-guid.html