Competency E

Design, query, and evaluate information retrieval systems.

Competency Definition

The work of information professionals always leads to some level of interaction with information retrieval (IR) systems, whether they are traditional commercial research databases, internal organization intranet systems, or even web search engines like Google. Information professionals are trained to understand a database’s possibilities, based on its backend structure, user interface (UI), and intended purpose in terms of subject matter, which in turn influences the degree of precision in search results and the results themselves (Bell, 2015). Searching through IR systems entails the realization that there is no “one-stop” solution to research problems, as there are trade-offs with using a variety of different approaches; yet with prior knowledge of such trade-offs, a searcher could find ways to efficiently take advantages of these weaknesses to get the most coverage of relevant literature as possible across IR systems (Mann, 2015). Information professionals then must be able to position themselves as one of the stops along the way in assisting users in their search queries. They must be able to leverage their knowledge of IR system design and search tactics to help users evaluate the usefulness of various IR systems and develop effective search strategies so they can search on their own with confidence.

Knowledge of the design and structure of IR systems is essential in understanding how to access and retrieve documents from those systems. Design here refers to the entire process of conceptualization, implementation, evaluation, and change management of subsequent revisions involved in building a system, as well as the communication involved at each stage of the rationale and necessity behind each step (Weedman, 2016). Users are able to retrieve information through the IR system’s various access points, which are influenced by the nature of the IR system’s data, the processes implemented used to index documents based on their relevance, the search fields and associated subfields, the retrieval mechanics, and the UI. [For further discussion on design in regards to other aspects and kinds of information retrieval, see Competency G.] The goal of any IR system is the retrieval of “all and only” relevant documents to a user’s query. Relevance is dependent on context, at the levels of IR system design and evaluation (involving structure, content, intended users, etc.) and the actions of its users, along with their information needs, information seeking habits, affective and motivational states, among other factors (Saracevic, 1996). The objectivity and subjectivity of relevance are always held in tension between system and user, as the user navigates through an IR system influenced by her/his internal states and external factors, influenced too by the various vocabularies used in those contexts. Relevance then is an arguably fluid relationship between query and product, user and retrieved information that varies across IR systems (Schamber & Eisenberg, 1991).

Though the relevance of documents may vary, the mechanics involved in generating search strings to retrieve documents remain relatively constant. The use of basic Boolean operators, AND, OR, and NOT in relation to basic search fields (e.g., keyword, subject, title, author) is similar across commercial research databases, in being able to tailor wider recall or greater precision of search results; the concepts of proximity (NEAR or n/), truncation (some common symbols used are *, !, and ?), phrase searching (putting quotes around a string of words to retrieve documents with that exact content) and nested statements using parentheses also apply in many databases (Bell, 2015; Mann, 2015).

Other useful mechanics involve knowledge of a database’s controlled vocabularies, which allow for searches of like concepts and synonymous terms, and limiters used to refine searches, such as “scholarly/peer reviewed,” “full text,” date, source type, document type, and language (Bell, 2015). Yet it is also important to see what kinds of indexing terms are being used by a database in retrieving various results in order to retrieve the greatest possible coverage of articles across databases, as each database uses its own terminologies in their indexing processes and can retrieve relevant items that “one stop” searches cannot (Mann, 2015).

Another important search tactic in relation to the basic search tactics described above involves conducting searches based on a citation of work relevant to a searcher’s needs, essentially working with previous sources before that starting-point source (Mann, 2015). This is known as citation chaining, which is a process of following chains of citations or other forms of referential connection between material, starting with a single or few core sources and moving through their references lists (Ellis, 2005; Mann, 2015). With the rise of born-digital academic work, web sites, and other materials posted on the Internet, this process can also involve following trails of hyperlinked text to see where and how an author cited previous work (Ge, 2010; Bell, 2015). From these developments, citation chaining can also now occur moving forward in time, to subsequent sources published after the starting-point source. A strong rationale for citation searching can be found in the observation that “the act of citing is an expression of the importance of the material” (Garfield, 1983). Conceptually related material tends to be cited together, as later work that cites earlier ones usually (but not always) talk and conduct similar work about the same or similar subjects (Mann, 2015, p. 140). Though some databases are taking advantage of this notion by offering recommendation features such as suggested articles based on others’ searchers (akin to online shopping product recommendations), there are other specialized databases that are solely structured around this notion of citation chaining, namely Web of Science (as discussed in one of the evidentiary items below). Citation chaining then is useful in the sense that it circumvents the typical notions of generating search strings; however, it presupposes that one’s core article is a quality, relevant source at the beginning of the citation chaining process (Mann, 2015).

Search is essentially a gradual process, the work of information professionals will always involve some form of engagement with searching through databases and other IR systems. In my coursework here at the iSchool, I engaged in the design, querying, and evaluation of IR systems through the courses INFO 202: Information Retrieval System Design, INFO 244: Online Searching, and INFO 254: Information Literacy and Learning. In an INFO 202 group project, I helped design and build a functional, basic database in WebDataPro. In INFO 244 and INFO 254, I discussed database navigation and search strategies for a particular database through information session-style screencast recordings, as well as demonstrated basic and advanced search tactics in short video segments. From these experiences, I see that the design, query, and evaluation of IR systems involves knowledge and demonstration of the knowledge of the design and structure of IR systems, the recognition of the purpose of a particular IR system and its available search parameters, and the ability to teach these search concepts to a particular user group as part of an overall information literacy instruction (ILI) strategy.

Discussion of Competency Supporting Evidence

Evidence 1: INFO 202 WebDataPro Database Group Project

Database Export File | Beta Build Report

Te_Group7betadesignInfo202.docx

In the iSchool core course INFO 202: Information Retrieval System Design, students learn about information organization and retrieval, with related structural and design principles involved in the organization of documents in a given IR system and in relation to a specific user groups’ information needs. For one particular group project, I worked with four other group members to develop a basic WebDataPro database of records for non-traditional objects (i.e., text-based documents like books and research articles). The project itself entailed numerous parts: designing a data structure for these non-traditional items in WebDataPro; creating and manipulating a small sample of records that represent these files; developing a statement of purpose for the database with a specific user group in mind; developing a set of indexing rules for other indexers to create records in the database; beta testing and evaluating another group’s database to inform design decisions about one’s own; and achieving all this in a group working collaboratively online, asynchronously and synchronously.

In designing this database, we had to determine our target users and the purpose of the database, which informed how we determined the attributes of our database’s collection that would be important to target users, what search fields and criteria would best represent these attributes, and how these fields would be structured. These three considerations would impact how users would be able to search and retrieve records. Per project instruction suggestions of having one group member manage WebDataPro access for the group, I was the designated person managing the backend tasks in building the database; I then let the rest of the group decide on the database’s content. We all decided to create a database for people who wanted to adopt dogs, of all various types, sizes, and breeds. The purpose of this “Adopt-A-Dog Database” was to provide people with necessary information that allows them to choose and adopt a dog that aligned with their various living circumstances (e.g., house space, allergies, presence of other dogs, cats, or other pets at home), towards providing dog and new owner with the most beneficial and compatible owner-pet relationship. The search fields themselves would also provide a basic overview of canine qualities that a curious beginning adopter might find helpful.

As seen in the final report of our group’s beta database build, we came up with 11 fields to index items in our database: Breed, Temperament/Personality, Size, Exercise Demands, Age, Sex, Fur Color, Training, Medical Background, Allergy-Friendliness (Hypoallergenic), and Spayed/Neutered; we also added the optional field Dog Backstory to provide a short qualitative element that provided more description of the dog. We defined specific indexing rules for each of these fields, as seen in the following rules for size/weight, a mandatory field for indexing information, and temperament/personality, a non-mandatory field for indexing information:

Element: Size/Weight

Definition: General size and range of weight that the particular dog is categorized as.

Type: List

Obligation: Mandatory

Occurrence: Non Repeatable

Input Guidelines: Available choices; Toy/10 & under, Small/11-25 lbs., Medium/26-50 lbs., Large/51-100 lbs., Extra Large/101+ lbs. Round weight up to nearest whole number.


Element: Temperament/personality

Definition: Drop down list containing information about how the personality of the specified animal will interact with other living creatures near it. What type of home environment it should be placed with.

Type: List

Obligation: Non Mandatory

Occurrence: Non Repeatable

Input Guidelines: Brief terms verifying any personality traits that should alert owners as to what type of household pet should live. Available choices; Child friendly, Other dog friendly, Cat friendly, Other pet friendly, Only pet. Enter only one value for each record.

The following screenshot shows the UI of the search fields screen in WebDataPro:

(Click image to see full sized screenshot.)

Here is a sample record of one of the records from the .csv file of the database export from WebDataPro:


Size/Weight (lbs): Toy (10 lb & under)

Sex: Male

Temperament/Personality: Only Pet

Exercise Needs (hours/day): Medium

Breed: Chihuahua

Fur Color: White

Age (years): Adult (3-7 years)

House Trained: Yes

Pre-existing Conditions: Yes

Spayed/Neutered: Yes

Hypoallergenic: Yes

Dog Backstory: Maximillion*, found neglected in a run down street, is looking for a loving companion. Full of energy and extremely adventurous, Maximillion is eager to connect with a new family member. (* - name was intentionally misspelled, for effect)

From this project, I gained first-hand experience of the complexities involved in creating and managing an IR system, even one as rudimentary as the Adopt-A-Dog database. There are so many moving parts between a database’s content and structure. Design itself is a social process predicated on uncertainty, as needs are understood and translated into design concepts (Weedman, 2016). Designers are never able to create a database with the intended results, from constraints that arise from colleagues, end user needs, finances, materials, time, etc. Yet it is from "the definition of constraints [that] contributes to how a design problem is defined and therefore to how it is solved" (Weedman, 2009, p. 1496). There was initial difficulty in learning what I could and could not do with WebDataPro, figuring out what our necessary data fields would be and how to make them functional. I had to make compromises, which like in many other database projects, come from failures encountered and how designers attempted to address them (Weedman, 2009). All these compromises were made in the interest of time, the requirements of the project, and more importantly, the intended user. Keeping the user in mind is the most important consideration when developing a database, from its navigation, the UI, to the contents’ records themselves.

In building this database, I had to rely on the work of my group members coming up with the actual database’s content, from the SOP, the rules, and the records. Yet there is always room for improvement in database design. Moving beyond the requirements and restrictions of the project parameters, I felt that the database would grow stronger with the addition of dog pictures with the records, the option for a simple search bar, and integration of the statement of purpose into the initial search screen and landing page, so that users would know the rationale behind the available search fields. This way of database design thinking is something that I gained from the group project experience; I feel that I would be able to bring to future database related projects in future work place, such as the next evidentiary item below. [For further discussion regarding the group dynamics and personal reflections for this project, see Competency M.]

Evidence 2: Web of Science presentations INFO 244 and INFO 254

This evidentiary piece and discussion has multiple parts, all connected by how I used the same database, Web of Science (WoS) for course final projects in INFO 244: Online Searching and INFO 254: Information Literacy and Learning. INFO 244 is a course that covers techniques of online searching and understanding database structures through guided navigation exercises through ProQuest, LexisNexis, and Web of Science. INFO 254 examines the theory and practice of ILI through various hands-on assignments and discussions. In both INFO 244 and INFO 254, I was tasked to generate instructional content describing, evaluating, and demonstrating basic and advanced navigation and search tactics through a particular database. The project focus in INFO 244 was geared towards a 30 minute instructional session, while the focus in INFO 254 was geared towards three short videos (one five-minute database overview, two two-minute demonstrations of basic and advanced search functionalities). Through this experience of taking both courses during the same semester and working with the same database across two final projects, I was able to evaluate WoS through the lens of the two different objectives of these courses and their respective final projects, the screencasts of which are presented here.

The audience of these instructional and demonstration videos was social sciences graduate students and faculty in academic settings who are unfamiliar with the robust opportunities that WoS provides. WoS is built for forward and backward citation chaining, useful for searching for scholarly resources and potential future research collaborators; it can also generate citation reports, showing how much one’s work is getting noticed and cited, and provide ways to find related articles, beyond the content of an initial search query. Results lists from WoS searches do lead to accessible research articles, but how they are accessed and indexed is different compared to more traditional databases. WoS requires a somewhat different kind of way of thinking about search results, in how they are presented and the content itself. To demonstrate the functionality and value of WoS to my imagined audience of social sciences graduate students and faculty, I framed the discussion introducing the technique of citation chaining. I created the following graphic to visualize this relationship, in how WoS can fit into and expand one’s current research habits, as a way to facilitate chaining and monitoring of an article’s impact (whether it is a particular researcher’s work or even one’s own work).

(Click image to see full sized screenshot.)

As I discussed in these videos, WoS is a citation index, indexing by whole citations and not by subjects as in traditional databases, allowing for retrieval of peer reviewed articles, books, and other materials across disciplines. Whatever is listed in the references section of an article and mentioned in-text, WoS will provide an entry and connection to it when conducting searches. The way how WoS is structured and indexed is as close as we can currently get to “searching by concept” as opposed to “searching by words” in traditional databases (Bell, 2015). This indexing rationale is rooted in the work of the late Eugene Garfield, founder of the Institute for Scientific Information, the forerunner to WoS. Garfield came up with the concept to index an article’s citations along with its references from following premise:

“Since authors refer to previous material to support, illustrate, or elaborate on a particular point, the act of citing is an expression of the importance of the material. The total number of such expressions is about the most objective measure there is of the material’s importance to current research. The number of times all the material in a given journal has been cited is an equally objective and enlightening measure of the quality of the journal as a medium for communicating research results.” (Garfield, 1983; italics mine).

With 33,000+ of the top tier journals across disciplines with a time range that could go back to 1900 (depending on a library’s subscription access), WoS is a great database for users who want to search for how related disciplines discuss certain aspects of a topic, and see how a particular work was cited after its publication. Researchers can see how shared references across articles hint at the similar work that others are doing, which can lead to insights for potential future collaborations, grant options, finding publication venues, and even seeing the citation impact of one’s own published work over time. They can also monitor developments of a particular author’s or article’s impact and influence over time, as well as the impact of their own work by setting up notifications with saved searches.

In these videos, I discussed basic and advanced search and navigation tactics. In WoS, the majority of database search operators apply, phrase searching is allowed, and truncation (*) is recommended and can be used in any part of a word as long as there are at least two letters in front of the asterisk; searches by author name are welcome in WoS, though there is a last name – first name convention to keep in mind. There are also other ways to refine results, such as toggling the results list to sort by newest to oldest, recently added, and times cited, and filtering by WoS categories, document types, authors, institutional affiliations, and other factors. (See the screenshot below of additional sort options.)

(Click image to see full sized screenshot.)

I also described other researcher related resources, from WoS’s citation mapping feature that graphically represents the relationships between articles, the concepts of h-index and JCR factor as metrics to gauge the effectiveness of an author’s research output and journal impact, ResearcherIDs or ORCID IDs used as persistent, unique, digital identifiers that link a researcher to her/his work across publications and systems, alongside other useful information about WoS’s indexed journals (an outline list of all indexed journal titles by subject category is available here. I put all of these videos on a proof of concept Google Site, along with other resources created by the maintainers of WoS, Clarivate Analytics (an extension of Thomson Reuters).

As great as WoS is in facilitating citation chaining, it is important to remember that WoS only takes from the best journals in the field, not all journals. WoS coverage is limited to the subscription of the library that is providing access. Some coverage can go back as far as 1900, but this is costly, so for most subscription plans, the date goes back to roughly to a particular year, such as 1987 to present. But whatever is cited in the references section in an article will still be indexed and mentioned, even if it is not in WoS. Citation chaining in WoS is only as effective as the source used to start the search. And there are no guarantees that WoS search results will have relevant literature, or that all the best sources will cite the starting point source. It is also entirely possible that all the good sources that one needs are not linked by citations or that the starting source is cited in another context not relevant to one’s research interests. But taken together with standard usual search techniques through subject headings and controlled vocabularies in traditional databases, citation chaining and tracking with WoS allows researchers to find relevant research articles in a highly vetted, specialized citation index.

I also discussed what I saw as implications of WoS searches, at individual and macro levels. At the individual level, researchers can use WOS to see the influence of their work in the field by seeing how many other researchers are citing their work with Analyze Results and Alerts from saved searches features. Researchers can also use WOS to find potential future collaborators and their contact information – email, phone number, address, organization/institution, etc. – as printed in the article. WoS also allows for searches via organizational affiliations, in retrieving works generated from a specific university or department, and grant funding agencies, in seeking future funders for similar kinds of research. At the macro level, institutions can use WoS to compare JCR impact factors, gauge research output from a particular organization, influence library collection management decisions (such as selection of journal subscriptions based on faculty use), and assist in tenure considerations based on a tenure candidate’s publications.

WoS is a powerful resource that takes advantage of concept searching of articles in prestigious journals across major disciplines. Though it requires a somewhat different way of approaching search compared to traditional academic databases, information professionals can market WoS alongside other databases for their users towards research that retrieves relevant, quality results. As I have demonstrated in these projects, a basic understanding of search, user needs, and database structure is necessary to engage in effective research practices and be able to teach them to others. Information professionals then can engage in effective ILI and discuss practical implications in demonstrating the value of databases such as WoS to particular groups on their campuses or organizations.

Future Directions

In reflecting on the experiences and lessons gained from these projects, I can see my growth in understanding how IR systems work and how to navigate them effectively. I took INFO 202 during my first semester at the iSchool, and I took INFO 244 and INFO 254 in the same semester a year later. I can see how the mechanics of IR system design play into how one conducts queries in such a system, which in turn influences how I can evaluate the utility and effectiveness of my own search practices and the purpose and structure of databases such as WoS. I can see how these projects can inform my potential future work in ILI, whether that takes place in a library setting or integrated into my course curriculum. I can also see myself developing my search tactics further, for personal research and in being able to assist others in their research related questions. [For more about what I see as my potential future work in ILI, see Competency I.]

As I discuss and reflect on what I have gained in relation to this competency, I think back to some INFO 244 discussions about the “single search box” mentality and web-scale discovery (WSD) that are recent developments in today’s database search processes. In one INFO 244 discussion, I discussed how Google-like search interfaces are “dumbing down” the research process to “skim the surface” of a collection’s material, but not all of its content (LaGuardia, 2012, p. 602). I was reminded of the old saying, “If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.” So if a database UI looks like Google, swims through information like Google, and has buttons that lead to a page of results like Google, it must search like Google too, right? (Wrong.) As the LIS profession, according to LaGuardia (2012), continues “to build virtual monuments rather than responsive, intuitively usable systems,” the profession continues detracting from actively benefitting the individual user. This also weakens the information professional’s position to demonstrate the value of databases to their users, especially if users a) do not know about various database options relevant to their needs and b) if they know about such databases, do not know how to access and use them effectively. The remedy then is to “keep a local focus on user needs,” to see what a library’s users really need and developing user-focused, tailored, contextual resources to promote information literacy skills and build up the “responsive, intuitively usable systems” that are needed in today’s libraries (LaGuardia, 2012).

Yet such systems have a cost: time, manpower, resources, money, and energy…information professionals can and will experience some level of burnout as a result of working longer hours due to budget and service reductions (LaGuardia, 2012). WSD arguably could be a feature implemented by information organizations to help reduce such burnout and support users’ search experiences. As the name implies, WSD is a discovery interface layered upon existing database access points provided by a library (Hoeppner, 2012). WSDs are clean, user friendly, and intuitive to use. In theory, a single search across a centralized index of a library’s accessible materials is a powerful way to meet users where they are in their information search abilities and work with their natural ways of searching (Hardenbrook, 2013). In practice, it seems like WSDs are working now, with algorithms tailored to users’ own weightings and evaluations of their results by currency, relevance (by term inclusion as searched, proximity of terms to each other, etc.), whether it is scholarly or professional, among other factors.

I see two major obstacles that WSDs need to overcome in order to truly be effective: licensing and the dangers of searcher’s ignorance. WSDs will include what they are able to include in their central indexes, based on a library’s subscriptions and customization choices regarding what they want shown to their users (Ellero, 2013; Hoeppner, 2012). Licensing here causes conflicts between what a library has and what can be shown through a WSD, as what is posted regarding database access may be limited to a certain, recent date range, among other proprietary aspects. As long as copyright and licensing exist in the way that it is currently in academic writing, it will be hard for a library to have a unified central index of all of its database materials (Parry, 2014), which also plays into what I personally call searcher’s ignorance. Whenever I search for information, I always think whether the results I am seeing includes everything that I can possibly find with a given search query. Ignorance may be bliss at times, but not knowing what I do not know in a search is scary as well (as I think of lots of what-if thoughts regarding what I am seeing in search results, based on what terms I used, the order of operations, etc.). WSDs sell themselves on the fact that they have lots of materials indexed in a way that promotes ease of retrieval and high relevancy. This definitely makes sense for recent materials, but not necessarily for older materials, especially in situations when it may be necessary to search for older, seminal works in a field, or in searching for materials that have (or lack) valuable metadata when accessed from a particular database that is not represented in a WSD interface. WSD is in its early infancy stages: the interface is great and very accessible, but it is what is happening at the backend that matters in search (Ellero, 2013). At some level, “permanent beta is ok” (Hardenbrook, 2013): as long as user feedback regarding how a WSD is set up in a library is acknowledged, addressed, and implemented, libraries can grow and interact alongside their users even more.

Information professionals engage in and utilize knowledge of the design process in building IR systems when providing database instruction to users. They also need to be able to evaluate current IR systems and tailor their instruction of specific databases to their respective user groups, as well as look ahead to the future as users will contend with “single search box” UIs and WSD systems. It is the information professional’s job then to demonstrate knowledge of the design and structure of IR systems and teach these concepts in targeted and tailored ways to their users so they can search efficiently and effectively on their own.

References

Bell, S. (2015). Librarian's guide to online searching: Cultivating database skills for research and instruction. Santa Barbara, California: Libraries Unlimited.

Ellis, D. (2005). Ellis's model of information-seeking behavior. In K. E. Fisher, S. Erdelez & L. McKechnie (Eds.), Theories of information behavior (pp. 138-142). Medford, N.J.: Information Today, Inc.

Garfield, E. (1983). Citation indexing: Its theory and application in science, technology, and humanities. Philadelphia: ISI Press.

Garfield, E. (1994, January, 3). The concept of citation indexing: A unique and innovative tool for navigating the research literature. Retrieved from http://wokinfo.com/essays/concept-of-citation-indexing/

Ge, X. (2010). Information-seeking behavior in the digital age: A multidisciplinary study of academic researchers. College & Research Libraries, 71(5), 435-455. doi:10.5860/crl-34r2

LaGuardia, C. (2012). Library instruction in the digital age. Journal of Library Administration, 52(6/7), 601-608. doi:10.1080/01930826.2012.707956

Mann, T. (2015). The Oxford guide to library research. Oxford, New York: Oxford University Press.

Schamber, L., & Eisenberg, M. (1991). On defining relevance. Journal of Education for Library and Information Science, 31(3), 238-253. doi:10.2307/40323384

Ellero, N. P. (2013). Integration or disintegration: Where is discovery headed?. Journal of Library Metadata, 13(4), 311-329. doi:10.1080/19386389.2013.831277

Hardenbrook, J. (2013). Stop thinking so much like a damn librarian (or how I came to love discovery layers). [Blog post]. Retrieved from https://mrlibrarydude.wordpress.com/2013/09/10/stop-thinking-so-much-like-a-damn-librarian-or-how-i-started-liking-discovery-layers/

Hoeppner, A. (2012). The ins and outs of evaluating web-scale discovery services. Computers in Libraries, 32(3). Retrieved from http://www.infotoday.com/cilmag/apr12/Hoeppner-Web-Scale-Discovery-Services.shtml

Parry, M. (2014). As researchers turn to Google, libraries navigate the messy world of discovery tools. Retrieved from http://www.chronicle.com/article/as-researchers-turn-to-google/146081

Saracevic, T. (1996). Relevance reconsidered '96. In P. Ingwersen, & N. O. Pors, Information science: integration in perspective (pp. 201-218). Copenhagen, Denmark: Royal School of Library and Information Science.

Weedman, J. (2009). Design science in the information sciences. In M. J. Bates & M. N. Maack (Eds.), Encyclopedia of Library and Information Sciences, Third Edition (pp. 1493-1506). New York: Taylor and Francis. doi:10.1081/E-ELIS3-120043534

Weedman, J. (2016). Lecture 4: The design process. In V. Tucker (Ed.), Information retrieval system design: Principles and practice, 2nd ed. (pp. 219-232). Ann Arbor, Michigan: AcademicPub.