
Introduction
Love Data Week took place February 10-14, 2025. The theme was “Whose Data Is It, Anyway?” A team of Columbia Libraries staff collaborated on a series of workshops that explored different datasets and how to work with them.
Organizing team
- Ben Chiewphasa
- Esther Jackson
- Kathryn Pope
- Wei Yin
With instructional support from:
- Caro Bratnober
- Jeremiah Mercurio
Project planning
We, as Columbia Libraries staff, are sure the Columbia community is data savvy, so the Love Data Week 2025 event planning didn’t put any trending technologies in the spotlight but aimed to bring up the whole community’s attention to a few easy-to-miss data/digital literacy issues. For example, how to find both national and international open government data, how to leverage open and restricted secondary data for research and publishing, how to cite your sources (including data) appropriately, and how to wrangle text-heavy messy data without any programming skills. That’s why we partnered together, across several library departments (Digital Scholarship; Science, Engineering and Social Sciences; Humanities and Global Studies; Burke Library), and instructed a series of no-coding-required workshops towards digital literacy about and beyond academic research needs. These workshops were open to the whole Columbia community, including faculty, students, staff and their family members. Attendees were affiliated with various schools and departments, from both the Morningside campus and the medical campus.
Data Literacy Workshops
With the diversity of the Columbia community in mind, Social Sciences and Policy Librarian Ben Chiewphasa and Research Support and Data Services Librarian Wei Yin put two complementary parallel data workshops together: “Open Government Data” and “Love Data Literacy for Everyone”. The former focused on open access government data in and outside of the U.S., and the latter focused on a subscription-based data consortium called ICPSR (Inter-university Consortium for Political and Social Research).
Whether for research or for daily life, almost everyone uses government data, but they might feel hesitant if they are finding and using it thoroughly. Government data can be stored in various formats including text-based policy statements, historical documents in microfilm and videos, and most recently, numbers, digital maps, and image databases. Ben’s workshop described this complicated data landscape and provided several tools to navigate it. Attendees asked research questions related to climate change, Tibetan studies, and minority community building, which echoed the goal of this workshop: the multifaceted characteristic of government data.
Although many social science researchers know ICPSR as a wonderful data source, they might not pay enough attention to the varied research potentials of the same dataset from different data perspectives (longitudinal, panel vs. survey data; public-use vs. restricted data). Others without a social science background might not notice that news articles and social media posts they read on a daily basis (such as in the New York Times, the Wall Street Journal and Bloomberg News) use the same dataset for reporting and visualization. Wei’s workshop highlighted these underestimated data aspects, which inspired attendees to express their keen interests in learning further coding skills beyond this non-coding workshop.
Data Tool Workshops
In addition to our data literacy workshops, we offered three workshops that highlighted two open-source tools – Zotero and OpenRefine – that Columbia affiliates can use to wrangle, manage, and cite datasets.
Introduction to working with data in OpenRefine
OpenRefine is described as “a power tool for working with messy data” David Huynh – but what does this mean? It is probably easiest to describe the kinds of data OpenRefine is good at working with and the sorts of problems it can help you solve.
OpenRefine is most useful where you have data in a simple tabular format such as a spreadsheet, a comma separated values file (csv) or a tab delimited file (tsv) but with internal inconsistencies either in data formats, or where data appears, or in terminology used. OpenRefine can be used to standardize and clean data across your file. It can help you:
- Get an overview of a data set
- Resolve inconsistencies in a data set, for example standardizing date formatting
- Help you split data up into more granular parts, for example splitting up cells with multiple authors into separate cells
- Match local data up to other data sets, for example in matching local subjects against the Library of Congress Subject Headings
- Enhance a data set with data from other sources
Zotero is a free, open-source bibliographic program that allows users to collect, organize, cite, and share research. Generate correctly-formatted footnotes and full Works Cited lists with one click. This workshop taught by librarian Caro Bratnober introduced participants to Zotero, helped get started using it, and provided strategies for effectively utilizing it in your research and writing.
Advanced Citation Management with Zotero
For users who are already comfortable using Zotero, this workshop explored some of this open source software’s many plugins and extended features. Attendees explored PDF annotation, group libraries, linking to an external host for library files, advanced searching, saving searches, and exporting data for use in other projects.
Attendee feedback
Some takeaways that participants found valuable from the workshops:
- Tools and resource of the Open Data Charter (“Open Government Data”)
- Nuances with working with panel data and qualitative data (“Love Data Literacy for Everyone”)
- The fact that other stakeholders, researchers, or journalists can use the same dataset but arrive at different—or even contradictory—conclusions or insights compared to the principal investigator (PI) is not only normal but expected (“Love Data Literacy for Everyone”)
- How to modify text strings easily to get rid of white spaces (“Introduction to working with data in OpenRefine”)
The post-workshop survey also revealed several unexpected but valuable learner takeaways. While these were not the primary focus, they are worth noting—particularly the enhanced understanding of the Libraries’ services and offerings:
- CUL’s Services & Tools page
- Library-led initiatives beyond CUL, such as the University of New Mexico’s curated list of data rescue projects
- The Libraries’ research guides, including the Text Mining guide
- The availability of subject specialists
Promotion & Press
ARL Views: ARL Has Love for Data Week
Conclusion
Open Data Week provided a great opportunity for Columbia Libraries staff to collaborate in order to develop cohesive instructional materials and assessment strategies. We look forward to seeing students in future Libraries workshops and again next year!