RUSQ Rotating Header Image

The Digital Reference Electronic Warehouse Project: Creating the Infrastructure for Digital Reference Research through a Multidisciplinary Knowledge Base

Returning to the various approaches presented by Zeng and Chan, there are several possibilit ies. The first is called a satellite thesaurus, which starts with a superstructure thesaurus that would be appropriate for a general reference service. Then , where specialized thesauri are available, they are attached to a node of the general superstructure. This allows the maintenance of the individual specialized subject lists while maintaining some relationship between them.

Another approach is direct mapping, where terms from different vocabularies are mapped to each other. This is then built into the system, and whenever a search is performed on one term, it is mapped to the other terms. This does require more time to plan, but would make it easier for similar services with different subject lists to come together into one knowledge base. The danger with this effort comes with the general services, as it would prove challenging to map all general reference service thesauri to each other.

A third approach is switching, where all individual subject lists are mapped to an intermediary subject list. This is similar to direct mapping, except that everything is mapped to one list instead of trying to map all lists to each other. This is currently the approach used by several large multidisciplinary knowledge-base projects, such as High-Level Thesaurus Project (HILT) and the National Library of Medicine’s Metathesaurus.16

The HILT project is an intriguing one for the DREW project. During the last few years, researchers funded by the Joint Information Systems Committee in the United Kingdom have been creating a thesaurus to link resources from different information systems. They have based their work on the Dewey system, and this thesaurus is available at this Web site. If the DREW project uses this thesaurus as the base for the switching approach, where other services map to this general thesaurus, it will serve several purposes. First, the thesaurus will be the result of research and testing on multiple systems, so it will be stable and accepted. Second, it will raise the possibility of interoperability between DREW and other information services using the HILT subject list. Therefore, we are investigating the feasibility of using the HILT thesaurus as the DREW master subject list.

The implementation would involve individual services working with DREW to develop an appropriate mapping to the HILT subject list. In addition, the original subject terms would be captured in the data warehouse. As the project grows, there may be the need to create secondary, more specific, metathesauri to allow the mapping between different services focusing on the same topic area.

Eventually, this mapping will take place either as part of the data cleaning process, through mapping algorithms developed between DREW and each institution, or it will occur with the host institution mapping their subject headings to the shared thesaurus before submitting the transactions to DREW. It is expected that the number of services participating in the warehouse will be small enough that mapping programs could be created at the start of the integration of results from a new service with the aid of that reference service. An important consideration with mapping in the warehouse is that if a service changes their subject list it is updated in the warehouse; however, this would not prove a challenge through automated notification when DREW receives a new, unmapped subject.

Privacy

One of the constant concerns about library data is that of patron privacy. The library has traditionally been a safe place for users to gather information. Such legislation as the USA PATRIOT Act threatens the privacy of patron histories, as it gives government bodies the right to access patron records through a roving wiretap without the patron knowing they are being watched.17 In response to this, some libraries are actively deleting and shredding records.18 As digital reference services typically collect an e-mail address for a patron, it is possible that they also could be targets for a roving wiretap. If the archives of the service contain personally identifiable information about a patron, then the service would be required to turn over transactions if requested by the appropriate authorities.

In this case, the archival schema for DREW provides a method of protecting the personally identifiable information about a patron while still maintaining the useful information included in the transaction. In addition, the information needed to make administrative decisions is maintained. Therefore, the data warehouse balances the need to protect the patron and the need to maintain a data-based history of the service’s activities.

This type of data warehouse is typically used in bibliomining (data mining for libraries) to support decision-making across the library. However, there are some challenges in digital reference transactions that do not occur in other types of library transactions. Since patrons ask a free-text question or have a flowing discussion, it is possible that patrons might include personal information within the text of their question. There are currently no automated solutions to strip out the personally identifiable information from a reference transaction.

This is similar to the problems of de-identification of medical records where personal information is removed while the useful information from the records is maintained.19 An active research area in natural language processing is the automated identification and replacement of this personal information in medical records. As this research agenda is advanced and solutions are created, these medical informatics tools will be adapted for use in reference transactions.

Safe Harbor Policy Compliance

One of the goals of DREW is to involve other countries; therefore, there are certain international privacy guidelines to which DREW will adhere. These guidelines were originally created by the European Union, and have been adopted by the United States. This policy, known as the Safe Harbor Privacy Principles, is made up of seven areas that ensure that those individuals whose data are in the data warehouse are properly protected. These areas form the basis of the DREW privacy policy:

  • Notice–Each service participating in the DREW project will add to its existing privacy policy a statement about DREW, the subset of transaction information transferred to DREW, what the data are used for, who is using the data, and how they can opt out of the project.
  • Choice–Users of digital reference service, including both patrons and experts, have the ability to request that their information be removed from the data warehouse. Due to the anonymous nature of DREW, this request will be initiated at the service where the question was asked, and the service will pass along the record ID to be removed from the warehouse.
  • Onward Transfer–To comply with this area of Safe Harbor, the digital reference services participating in the DREW warehouse must comply with the Notice and Choice clauses of this policy. This means that each service will notify their users about the DREW project and allow their users to be able to remove information from the warehouse. Any additional participants, such as library science researchers, will also have to verify that they offer a level of protection equivalent to that offered through this policy.
  • Access–The users and experts involved can request to see their DREW records through the service that submitted the question to DREW. After seeing these records, they can request to have them adjusted or removed from the archive.
  • Security–Access to records in the DREW warehouse will be controlled through password-protection and firewalls. Researchers working on topics related to the reference process may request data from DREW. Participating libraries will be able to receive their own transactions, as well as reports generated using the data from their transactions. As the DREW project grows, participating institutions will be made aware of the change in advance and be allowed to remove their transactions at any time.
  • Data Integrity–There is no personal information kept in DREW. If mistakes were made in transmittal, the submitting service can correct the DREW records. In addition, if the information in a transaction is incorrect, DREW participants can submit annotations to be added to a transaction.
  • Enforcement–The DREW advisory board will serve as an external body to ensure that DREW is complying with the Safe Harbor Policy. If needed, the DREW advisory board may contact an external group from another organization such as the American Library Association to investigate privacy concerns.

The Usefulness of DREW

This warehouse of digital reference transactions will allow a level of understanding about library services previously unavailable to researchers and educators. In addition, administrators of participating services will gain access to customized reporting and management information tools as they are developed.

Support of Current Teaching and Research

There are a number of lines of human intermediation research that would be advanced through the availability of DREW records. One of the challenges for digital reference researchers is getting access to large amounts of cleaned data; DREW will provide a robust source of transactions for these researchers. Those seeking to understand information seeking behavior or how experts use resources in answering questions would be able to rapidly improve the generalizability of their models through access to data on this scale.

Another line of research that would be benefited by this data warehouse is the measurement and evaluation of digital library services. Tools such as bibliomining require large amounts of cleaned data.20 DREW is an ideal place for bibliomining research, and the results will allow the development of new measurement tools for digital reference services and the discovery of novel and actionable patterns existing in the transactions. One goal of this line of research is to create a management information system that can be applied to the entire database for research purposes and that participating libraries can access to learn more about their own services.

Informing Service Management and Decision Making

One of the challenges facing individual services is the need for informed management decisions. This call is embodied in evidence-based librarianship, which implores librarians to use the best available evidence when making decisions for their library. In addition, librarians are asked to justify their services on a regular basis; many are too busy running their service to step back and create the tools needed to analyze their services appropriately.

As researchers develop methods of measuring and evaluating digital reference these tools and models can be integrated into DREW. As these tools are created, managers of individual services can request any of the reports created for the entire warehouse to be run on just the data from their own system. This creates a significant reason for services to participate in the DREW project, as they will then have access to a strong management information system associated with DREW.

Digital reference consortia will also benefit from this relationship, as they can get the same reports and information about their entire consortia. This type of information was previously challenging to collect and present, but is essential to strong decision making. As consortia make decisions that can have long-range impact and that may not be easily changed, it is important that these decisions be powered by the best evidence available.

Pages: 1 2 3 4 5 6

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>