Wei gives some tips on avoiding repetitive tasks in R. […]
Working with Large Datasets in R
Wei gives some tips on working with large datasets in R. […]
Starting a Python Research Project in 2026
Moacir leads us through structuring a Python research project with uv. The video cuts off suddenly just as the discussion of GCP begins, but the uv content is available. […]
Introduction to Python Text Analysis with TDM Studio and ChatGPT
Moacir lead us through a rapid introduction/refresher of the basics of Python before introducing Proquest’s TDM Studio tool. […]
Getting Started with Research Data: Concepts, Tools, & Resources
Today, Wei gave us a tour of working with research data and a preview of this semester’s unified Data Club. […]
Data Club Fall 2025 Schedule
We’re back from a busy but relaxing summer and are ready for another semester of Data Club. There are a few big changes this year. First, all of Research Data Services’s workshops are being offered as meetings under the Data Club banner. Eric, Jeremiah, Wei, and Moacir will each lead two meetings on topics of […]
Text Mine ProQuest with ChatGPT
On June 13, 2025, ProQuest announced that the popular text-mining environment TDM Studio now includes a beta feature that lets users integrate GPT models into their R or Python workbench notebooks. TDM Studio, available at no cost to researchers at Columbia, opens up the ProQuest databases to large-scale analyses of the full-text corpora. It comes […]
Libraries Acquire Full-Text Corpus Data
In December, the Libraries acquired twelve full-text corpus datasets, compiled by Mark Davies, a retired professor of linguistics from Brigham Young University. The corpora will help Columbia researchers across many disciplines to understand how language is and has been used around the world, and they serve as another mark in the Libraries’ commitment to supporting […]
Data Engineering in Python with Polars 1
Today, we begin learning Polars, an alternative data analysis Python library to pandas. We’ll learn about how Polars is similar to and different from pandas and why it is an appealing choice in 2025 for ETL (extract-transform-load) operations. […]
SQL and NoSQL Databases in Python with Pandas
Today we looked at using databases in Python. […]