Another demographic collected was the platform used by the reference service. The results after cleaning the data are in table 4. The entries for e-mail, Web form and e-mail, an in-house tool may refer to the same type of service–some type of system using existing e-mail and Web servers. If these are combined, then there are three clear popular choices–QuestionPoint, Tutor.com, and some type of in-house use of existing resources.
Exploration of Communication Forms
Much of the upcoming analysis is split according to the distinctions of communication form used, as the types of fields collected in chat may be different than the fields collected via a Web form and those collected via e-mail. The eventual goal is to create one schema that will serve all of these communication platforms.
A series of questions on the survey sought information about the communication practices of different service types. For example, all surveyed e-mail and Web form-based services e-mailed a copy of the answer or transaction to the patron; however, only 72 percent of the chat-based services regularly sent a copy of the transaction to the user.
A similar set of questions addressed the issue of the format through which questions are eventually resolved. These, reproduced in table 5, show that there is not much crossover between formats. Chat reference is resolved in chat about 80 percent of the time, and Web form questions are resolved via Web forms or e-mail most of the time. The high percentage of other forms of answers that started as chat reference is probably due to the fact that synchronous connection has already been made, and it is then convenient to complete the transaction via the phone.
Fields Collected by Services
To understand what information is being collected by services, the following analysis is presented in two parts. First, the fields currently collected by services are presented. Following that, the discussion turns to the data that inform the rest of this schema: What fields are services either currently collecting or willing to collect?
Table 6 lists the fields, sorted by category and overall usage, of what was currently collected by services during the reference process. Looking at the overall results, the most common set of fields currently collected about a reference transacti on are: patron e-mail and name; question tex t, date, and time; and the response text, date, and time. This aggregate set of fields d isguises patterns that appear when the results are sorted by communication method used.
Because the two most common communication methods are Web form and chat, they will be examined individually. Chat services tend to be more free-form, and therefore may not explicitly collect many fields. Some services ask the user to set up an account before the chat session; this will result in more information about the patron, but not more information about the specific information needed behind a reference transaction. Even though chat services tended to collect less information than average, many still collect the patron name and e-mail; question text, date, time, and referral and routing information; and the response text, date, and time. One field of note here is the above-average collection of referral and routing information. Many chat services reported capturing fields such as IP address, which was the most common information put into the “Other” open-ended survey questions. In addition, as seen earlier, chat sessions end in a different communication channel 20 percent of the time; they therefore have a stronger need to capture this type of transferal information.
The group of Web form reference services captured more information on average than other types of services; this is not surprising, as the process of asking a question via a Web-based form is more structured than asking the same question via e-mail or chat. The most common fields currently collected via Web form-based asynchronous reference are: patron e-mail, name, country, and state; question text, date, and time; response text, date, time; and responses collected. Because the information is collected in small fielded pieces, it is then easier to preserve those pieces when the information is moved to a data warehouse. It is because of this that DREW will start by aggregating Web form-based services, and then move to more free-form services as the warehouse develops.
One interesting pattern observed in the survey results is the lack of information collected about the person answering the question during the process. There are two types of individuals who answer questions–those who are trained to do research and answer a question from existing resources (such as librarians) and those who are able to answer questions in a specific topic area because they are trained experts in that area. Librarians are trained to provide citation information and document the authoritativeness of an answer through the support of external works. Experts, on the other hand, provide the authority for their answer based upon their credentials. If services do not keep information about the person who answered the question, then the authority behind an expert-answered question disappears. Because of this, it is important to encourage experts who are answering questions to supply references to works that would contain the answer to the question, even when they know the answer without looking anything up. As these experts may not have been trained as librarians, the administrator of the system needs to ensure that training is available in the basics of creating a response that will have supported authority with no identity of the answerer.
Fields That Services Are Willing to Collect
Another way of looking at the data is to explore which fields services either collect now or are willing to collect in the future. The data were recalculated using this new model, and the results are in table 7. This is important in aiding the development of the DREW schema. While services may not be currently collecting information, they may be more willing to collect the information if they perceive that the data will be useful in improving their service and the understanding of the field.
Looking at the Overall column, one can see that services are willing to collect much more information than they currently collect. One obstacle is the fact that patrons are less likely to ask a question if they have to fill out more fields. The patron and expert information need be collected only once and then matched to each question through a logon process. The question and response information would need to be gathered every time.
To develop the proposed DREW schema, each area of the survey will be explored and discussed as to the usefulness of the fields to research needs. There are two types of research needs that are important: the needs of administrators in understanding their own digital reference system, and the needs of researchers in looking at the larger-scale picture.
Transaction Information
One of the challenges of DREW is that it will hold different forms of intermediation. The goal is to collect questions from all types of digital reference services–chat, e-mail, form-based, and so on. Therefore, at the center of the DREW record will be the information from the transaction. For a chat transaction, the body of the chat will be included. In an e-mail transaction where there was little restriction on the information in the e-mail, the e-mail text will be included. If a Web form was used to collect fielded information, then the question and response will be divided and included. There will also be a field to identify the type of transactional data in the record.
Using this structure will make it difficult for some researchers to explore relationships between questions and responses. A priority for researchers is to develop algorithms that will divide the large textual chat and e-mail transcripts into separate questions and answers.
Patron Information
Even though services are willing to collect considerable patron information, little of this is actually needed in understanding the question-answering process. In fact, it is important to mask personally identifiable information about the patrons. Therefore, most of the patron information will not be part of the DREW schema. There are a few useful fields about the patron that more than half of the services would be willing to collect. Information about the location of the patron (such as country) is important, especially as different countries have different laws about intellectual property. QuestionPoint has faced many of these problems, and it is expected that as DREW grows, international intellectual property issues may arise.12 One of the common fields that was a write-in was zip code; this field combines city and state information and can be used to map DREW to a demographic database but does not intrude upon the personally identifiable information about the patron.
Another area of interest is the patron’s organizational membership or educational level. As different services cater to different age and educational levels, it would be useful to have some basic knowledge about the patron. An important distinction for DREW is the intended age level attached to a question, which might be different than the level of the patron asking a question. For example, questions asked by another for a child would need to be identified as a child-level question. For this field, services will have to map their own data collected about their questions to an Educational Level field, which would have the broad choices of:
- Child (elementary school, primary school)
- Pre-Teen (middle school, junior high)
- Teen (high school)
- College (undergraduate)
- Adult
- Unknown
Individual services will have to use their best judgment in mapping their own fields to these choices.
One of the products of DREW will be customized reports for each service type. To aid in this process, there will also be a Custom Patron Type field, which will allow a service to enter a different classification with local meaning for their service.
Question Information
It is more important to collect information about the question than information about the patron, as seen with the Educational Level field. Fields such as Date, Time, and Previously Consulted Sources are all potentially useful. Some type of Free-Text Subject and Category information is also useful, and one of the areas of research is to attempt to automatically map this to a common list. Services are willing to share Referral Information; the key information for DREW concerns whether the question was:
- Internal (answered in the same service where it was asked)
- External–Sent (sent out to a different service to be answered)
- External–Received (a question received from a different service)
In addition, there will be a Referral Service field, where the original service can indicate the name of the service involved in the referral. These data will be useful in understanding patterns of referral between services.