By Mollie Echeverria, Melanie Wacker, Alex Whelan
Authority control
One of the core functions of library cataloging is ensuring that all resources by or about the same agent can be found under the same access point or heading. This process is referred to as authority control. Authority control is commonly managed through the use of authority records, which include the preferred unique heading for a particular agent, along with variant forms of the name for cross-referencing purposes. These authority records are collected in indexes referred to as authority files.
In the U.S., the primary authority file used by libraries is the Library of Congress Name Authority File (LC/NAF). Records in the LC/NAF are created by catalogers at the Library of Congress and by institutions participating in the Program for Cooperative Cataloging Name Authority Cooperative Program (PCC/NACO). About 10 million authority records have been contributed to the LC/NAF by Library of Congress catalogers and NACO trained catalogers at other PCC member libraries over the years. (John Riemer. Wikidata Pilot Meeting: Background and Goals (August 27, 2020). https://wiki.lyrasis.org/display/pccidmgt/Wikidata+Pilot+Kick-off+meetings (accessed May 24, 2021))
Challenges of NACO Cataloging
In order to contribute authority records to the NAF, catalogers at NACO institutions must complete an intensive training. Because of the volume of training required to contribute to the NAF, generally only a select number of library staff at a given institution are able to become NACO catalogers.
Due to the relatively small number of catalogers trained to contribute to NACO, many agents in libraries’ bibliographic catalogs may be left without controlled name authority records. This lack of controlled access points can create a host of obstacles, including publications by different authors with the same name all being attributed to one person, publications by corporate bodies that have changed names not being linked to one another, and variant forms of an author’s name not showing up in searches.
Besides the limited pool of catalogers able to contribute to NACO cataloging, the formatting of names in the LC/NAF can create additional obstacles. Contemporary OPACs are reliant on matching text strings, a holdover from the time of the card catalogs. The access point for the agent’s name must be formulated according to current standards and always match the “preferred label” exactly.
Enter Wikidata
A few years ago, the PCC began an effort to put its name authority work on a more sustainable footing. As outlined in PCC Strategic Directions, 2018-2022, the PCC is now seeking to move the library community from static, record-based authority work toward more flexible, metadata-based identity management. It seeks to facilitate the implementation of new tools and technologies like linked open data and encourages collaboration with a more diverse variety of outside communities.
The PCC Wikidata Pilot (started in 2020) is the latest outcome of this work. Wikidata offers many possible benefits as a tool for PCC member organizations. Compared to the NAF, Wikidata contains over 89 million entities and has a much larger group of contributors from a far more diverse range of backgrounds because it is open to all.
PCC Wikidata Pilot at Columbia University Libraries
Columbia University Libraries’ Original & Special Materials Cataloging Department (OSMC) was immediately intrigued by the possibilities offered by the PCC Wikidata Pilot, and a team formed around the project consisting of Mollie Echeverria, Matthew Haugen, Ryan Mendenhall, Melanie Wacker, and Alex Whelan.
Amongst a number of project ideas, we had an immediate problem on our hands that we hoped this new Wikidata pilot could help us solve. Over the past three years, the Columbia Libraries have been digitizing a large number of audiovisual materials from our collections as part of the Mellon Audio and Moving Image (AMI) Project. Included in these materials are oral history interviews from the Oral History Archives at the Columbia Center for Oral History (CCOH). At CUL, oral histories are cataloged using individual MARC records, which are then fed into CLIO, the Oral History Portal, OCLC WorldCat, and eventually in converted form, the Digital Library Collections.
In year 1 of the AMI project, Alex Whelan, CUL’s Time Based Media Metadata Librarian, was trained to provide needed MARC bibliographic as well as authority records. However, in the following year, Alex’s efforts were needed to work primarily on audio/visual materials from CULs’s archival collections and the oral history processing was taken on by Metadata Operations Specialist Mollie Echeverria.
Mollie was familiar with MARC bibliographic cataloging, but NACO work was outside of her training and her responsibilities. Initially, Alex continued to provide the necessary authority work, but since he was not the one working with the actual materials, this proved problematic and labor-intensive.
The OSMC AMI team developed the idea that Mollie could create Wikidata entries. In these entries, she could capture the information about a specific interviewee at the time of cataloging. Alex (or any other NACO-trained cataloger at OSMC) could then use this information as a basis for minimal NACO records for use in our catalogs and databases. Both the Wikidata entry and the related name authority record would contain their respective identifiers thereby linking the two descriptions.
Workflow
Wikidata proved to be a very low-barrier tool and easy to learn for our project team.
Next, we had to develop a workflow. Mollie created a spreadsheet in which she entered the uncontrolled name and some basic information, such as the related CLIO ID of the oral history interview. She then searches Wikidata for an existing entry or — if none exists — creates a new Wikidata item. The identifiers for these Wikidata items also get added to the spreadsheet.
In addition, we are linking the Wikidata items to our project page using a specific property (P5008) that automatically updates our list of items created as part of this project. A NACO cataloger then uses this information to create a new authority record and formulates the correct access point and cross-references so that it can function in our systems, but links back to the fuller Wikidata entry instead of repeating all of the information. The new NACO identifier, in turn, gets added into the Wikidata description.
While Wikidata allows us to record all kinds of detailed information about an agent, we wanted to be careful about what should be included. For one, the basic idea was to reduce the time that it takes to do the authority work, so we needed to ensure that each entry did not turn into a research project of its own. Other PCC project participants had also reported that due to the large number of Wikidata participants, new items get enhanced by others fairly quickly. Second, we wanted to pay attention to the privacy concerns of the individuals described. We followed the Wikidata guidelines for the description of living people and created a set of core elements based on that.
Future Directions
Since Wikidata also surfaces information to a broader audience, we can use it to highlight our collections and agents from underrepresented groups connected to them. David Olson, CUL’s Oral History Archivist, has pointed to a group of African-American newspapers that are featured in the Black Journalist Oral History Collection, that should be represented both in the NACO file and on Wikidata. Alex Whelan has started on the groundwork of identifying both NACO practices for newspapers and the needed set of Wikidata properties to create more detailed descriptions there.
This workflow has proven to be easy to implement and opens the door to other projects where NACO catalogers could collaborate with archivists, curators, or graduate students thereby making our work more inclusive. We are looking forward to all the possibilities.