RUSQ Rotating Header Image

The Digital Reference Electronic Warehouse Project: Creating the Infrastructure for Digital Reference Research through a Multidisciplinary Knowledge Base

This is a simple and incomplete list. Issues of quality have not even been mentioned. These facts alone have stymied knowledge-base builders, and this in an environment where true scale has not even come into play. How will any team of humans be expected to maintain a collection of questions and answers in an environment of millions of possible records? This is arguably a more difficult problem than maintaining a collection of any other type of documents for the simple fact that a knowledge base is not conceptualized as a set of documents with provenance and date, but as a collection of the more nebulous “knowledge.”

While the use of full-text approaches such as vector-based information retrieval may mitigate some of these problems, they do not solve core difficulties of fact shifting, nor do they take into account the dynamic nature of the information presented. While the knowledge base grows, the relationship between information may change as well. This situation is complicated when archives from different services are combined.

The authors argue that attempting to devise, scale, and equip a deductive approach to knowledge bases is ultimately unworkable. The authors further argue it is time to try a radically different, inductive approach. Simply put: Let the knowledge base, or more specifically, the agents representing digital reference output, organize themselves.

Complex Adaptive Systems

The inductive approach proposed in this prospective is grounded in complexity theory and, more specifically, the concept of complex adaptive systems as conceptualized by Holland. The authors will not explain the whole of complexity theory or delve any further than an operational explanation of complex adaptive systems in this document. For a deeper understanding of complexity theory see Waldrop; for complex adaptive systems see Holland; and for the application of complexity to digital reference see Lankes.8

Put simply, complex adaptive systems are grounded in the creation of autonomous agents that self-organize based on relatively simple rules. This organization is emergent, in that it is not the product of some predetermined course, but a result of the interactions of the agents themselves. The most common analogy is that of flocking birds. Systems that simulate the flocking behavior of birds are effectively replicated by creating independent agents in a virtual space with a set of very simple rules, such as “you must move forward: get as close as you can to those agents near you; do not hit anything.” Such simulations demonstrate very effectively that such systems produce complex results with swarms of birds on a screen avoiding obstacles–even though they were never programmed to do obstacle avoidance–or swarming.

Models using these principles have also effectively been created to simulate the activities of financial markets, traffic flows, and population studies. The point is that complex adaptive systems, which consist of the interactions of autonomous agents, have been effectively used to create systems impossible to create in a deductive manner, where thousands of rules and lines of code would have to be used to anticipate every possible contingency. Already, artificial intelligence systems have moved away from these so-called frame-based and expert system approaches toward neural nets and inductive simulations.

Complex adaptive systems are also dynamic, in that the agents constantly adapt to a changing environment. They constantly seek an optimal state in changing conditions. So the virtual birds will avoid obstacles in new ways as new obstacles are added. In simulations of biological systems, agents will adapt to changes in weather or food supply. It is this dynamism that makes an inductive approach particularly suitable to digital reference knowledge bases.

To examine the contents of DREW and develop new, inductive approaches to knowledge-base analysis and construction, the research team must first define the autonomous agents in the complex knowledge-base environment. These agents, according to Holland, must have three mechanisms:

  • Tags: Mechanisms that agents utilize for aggregation and flows of information
  • Internal Models: A representation of the environment used by an agent to anticipate and adapt to the environment
  • Building Blocks: Components of internal models combined to build, test, and rebuild internal models.9

The internal models and building blocks will be the result of future research. Tagging, or the mechanisms used for information flow and identification, however, is central to the present study. These tags can be thought of as fields or metadata elements. By identifying common elements in digital reference transactions (knowledge-base agents) these agents can be compared, clustered, and examined. To take the first step in building a digital reference knowledge base as a complex adaptive system, the researchers turned to existing standards for representing digital reference transactions.

Standards for Exchange

The National Information Standards Organization (NISO) has developed a protocol for the exchange of questions between services, called Networked Reference Services (NetRef or NISO AZ).10 While this standard is appropriate for questions during the time period in which they are being answered, it is not appropriate for the long-term archiving of the exchange. One goal of the DREW project, therefore, is to create a schema for the archiving of digital reference transactions once the question-answering process is complete. It is important that this archival schema be compatible with the NISO standard; perhaps it can eventually become part of that standard. Theoretically, it should be easier for systems implementing the NetRef protocol to work with the DREW archival schema.

As these questions are answered, individual reference services create archives of question and answer pairs. These are the artifacts of human intermediation, and they represent valuable information that previously was lost in traditional reference. Sometimes these archives are searchable by the public, and other times they are kept as referral tools for the librarians and experts to use in answering questions. This distributed knowledge base of digital reference archives contains the expertise and knowledge of many minds; however, there is currently no way to merge these separate archives into a single knowledge base. If these reference transactions from different services could be collected, cleaned, and privatized into a single data warehouse, the amount of expertise available to users and researchers would be staggering. However, the challenges involved in creating this type of warehouse are just as staggering. The goal of this paper is to present the preliminary research in determining the fields that could make up such an archival schema, and to present current and future plans of the DREW project.

Determining the Fields

The first step in creating a data warehouse is to determine the fields that will be collected. As there are many different digital reference services, any schema for capturing information from these different services will result in compromises. To better understand what fields would be appropriate to capture, a survey was taken of digital reference service representatives.

To develop the fields needed for the archiving of digital reference transactions, it is necessary to start by exploring what is currently captured and then work toward implementation in an iterative manner. The first stage is a survey of digital reference services with the goal of learning the following, with respect to each of four categories–Patron, Question, Answer, and Expert:

  • what fields are services currently collecting;
  • what fields are services not currently collecting, but willing to collect; and
  • those fields services are not willing to collect.

First, field lists were created from Janes’s work and a small group of digital reference services, which were in turn used to develop a survey instrument.11 This instrument was tested with a set of volunteer librarians from those services; these librarians added additional fields to the instrument. The instrument was then delivered at the 2003 Virtual Reference Desk conference and through a Web-based survey. The online survey was promoted through the DIG_REF discussion list as well as through direct contact of services doing digital reference research. If an institution had different types of reference services (such as live chat and Web form-based asynchronous), representatives were asked to fill out the instrument once for each type of service.

The survey gathered demographic information: the communication methods used for question acceptance and question resolution, the number of questions received per month, the platform used, consortia information, and the like. The survey continued with a series of questions about the collection status of the fields listed in table 1. There were other open-ended questions asked about some of the fields, such as the location of subject lists, other fields collected but not listed in each category, and other comments.

Demographics of Respondents

There were fifty-three responses to the survey, which represented forty-nine different organizations. Respondents who had different reference services (such as chat and e-mail) and who kept different archives in the same organization were asked to fill out a survey for each service. There was little duplication by members of the same consortial group in the survey responses.

Of those services that could be affiliated with an institution, slightly more than half (53 percent) were from academic libraries. The remaining services were fairly evenly split between public (15 percent), special and other libraries (17 percent), and AskA services without a specific library affiliation (14 percent).

About half (47 percent) of the responses were from chat-based services, 38 percent were from Web-based asynchronous services, and the remaining 15 percent used e-mail or another communication platform for reference. Combining the communication type variable with the service affiliation did show some differences, as can be seen in table 2. For example, chat was more commonly used in academic libraries, while asynchronous Web-based form was the common method in public libraries and independent services. This would prove an interesting finding to explore on a larger basis–to see if it is generalizable and to attempt to shed light on the reasons behind the differences.

Another question addressed the average number of transactions per month. Here answers ranged from ten to thirty thousand (for Tutor.com’s Online Classroom). This range of answers is represented in the data in table 3. In each case, the standard deviation is greater than the mean, which means the data are badly skewed. The median was calculated to give a less biased idea of the central point of the data. The median number of Web form-based questions was eighty per month, and the median number of chat questions was 120 per month. The nonnormal nature of this data makes a trustworthy generalization difficult to produce.

Pages: 1 2 3 4 5 6

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>