Moacir lead us through a rapid introduction/refresher of the basics of Python before introducing Proquest’s TDM Studio tool. […]
Getting Started with Research Data: Concepts, Tools, & Resources
Today, Wei gave us a tour of working with research data and a preview of this semester’s unified Data Club. […]
Data Club Fall 2025 Schedule
We’re back from a busy but relaxing summer and are ready for another semester of Data Club. There are a few big changes this year. First, all of Research Data Services’s workshops are being offered as meetings under the Data Club banner. Eric, Jeremiah, Wei, and Moacir will each lead two meetings on topics of […]
Text Mine ProQuest with ChatGPT
On June 13, 2025, ProQuest announced that the popular text-mining environment TDM Studio now includes a beta feature that lets users integrate GPT models into their R or Python workbench notebooks. TDM Studio, available at no cost to researchers at Columbia, opens up the ProQuest databases to large-scale analyses of the full-text corpora. It comes […]
Libraries Acquire Full-Text Corpus Data
In December, the Libraries acquired twelve full-text corpus datasets, compiled by Mark Davies, a retired professor of linguistics from Brigham Young University. The corpora will help Columbia researchers across many disciplines to understand how language is and has been used around the world, and they serve as another mark in the Libraries’ commitment to supporting […]
Data Engineering in Python with Polars 1
Today, we begin learning Polars, an alternative data analysis Python library to pandas. We’ll learn about how Polars is similar to and different from pandas and why it is an appealing choice in 2025 for ETL (extract-transform-load) operations. […]
SQL and NoSQL Databases in Python with Pandas
Today we looked at using databases in Python. […]
Git and Gitting Organized (Also, Text Editing)
Today we talk a bit about project management and see how to use Git with VS Code. […]
Resource Spotlight: newly-purchased Dave Leip election datasets
The Research Data Services (RDS) just purchased a few new election datasets from Dave Leip for “United States Presidential Presidential Results” & “US Presidential Primary Election Results for Republican Party and Democratic Party”. All the RDS licensed Dave Leip datasets can be found in CLIO. This resource is available only to current Columbia affiliates. Please […]
Day One Exploratory Data Analysis with JavaScript
Today we return back to our Observable notebooks to learn how to do lightning fast exploratory data analysis! […]