The COVID Information Commons & Columbia University Libraries – using translation & transcription to increase accessibility to NSF-funded research
by Lauren Close, Lylybell Teran, and Esther Jackson, with editorial support from Florence Hudson, Macy Moujabber, Isabella Graham-Martinez, and Jeremiah Mercurio
With thanks to Lara Azar, Elia Bregman, Brian Buckley, Tushar Bura, Karem Coca, Cora Lee Cole, Yonara Anastacio Cubas, Paramveer Singh Dahiya, Victoria Horrocks, Shikhar Johri, Sstuti Mehra, Julie Meunier, Aditya Raj, Saanya Subasinghe, Sarai Vega, Rhyley Vaughan, and Kathryn Pope.
As the scholarly ecosystem becomes increasingly reliant on digital media, it is essential that we employ inclusive practices which promote digital accessibility. Accessible digital design refers to the practice of removing barriers that prevent individuals with disabilities from accessing websites, digital technologies, and online tools. Our goal is to develop content that all people can share, perceive, navigate, and interact with, regardless of their ability/disability status. It is equally important to share materials in multiple media formats and languages to reach broad, global audiences.
In collaboration with the Columbia University Libraries, the NSF-funded Northeast Big Data Innovation Hub (NEBDHub) (NSF award #1916585) began a project in 2021 to make the resources generated through community programming at the COVID Information Commons (CIC) accessible to the broader public. As of 2025, the NEBDHub team of staff and students, with support from the Columbia University Libraries’ Open Scholarship team, who manage Academic Commons, has created English transcriptions and Spanish, French, and Hindi translations for 148 COVID-19 Research Lightning Talk presentations hosted by the COVID Information Commons project team funded by NSF award #2139391. These presentations were made by NSF-, NIH-, UDSA-, and CDC-funded researchers and considered a range of scientific topics, including COVID’s impact on public policy, social structures, technology, the global economy, and beyond. These COVID-19 resources align with the NEBDHub’s website accessibility guidelines and Broader Impact principles. The materials will be maintained by the Academic Commons in perpetuity, ensuring their availability to students, faculty, staff, and members of the public across the U.S. and around the world.
In 2022, 44.1 million Americans (13.4% of the U.S. population) self-identified as individuals with cognitive, ambulatory, auditory, self-care, or visual disabilities. Given the sweeping significance of the COVID-19 pandemic, an event which has had an unprecedented impact on global health and economic outcomes, the academic and scientific community must ensure that research and educational materials on COVID-19 be accessible to the broadest possible audience. When such materials are designed using the principles of accessible digital design, all members of society have equal access to crucial information about pandemic prevention and preparedness measures.
Similarly, the CIC Project Team believes in the importance of providing the base presentations in English, and then translating the digital content into languages that serve the researchers, students and public who access our open online free content. After English, the second most commonly used language in the U.S. is Spanish. In 2019, approximately 41.7 million Americans spoke Spanish at home, 2.1 million Americans spoke French at home, and 1.4 Americans spoke Hindi or Urdu at home (representing 13.5%, 0.6%, and 0.4% of the U.S. population, respectively). To reach this sizable audience, the CIC Project Team committed to translating the valuable Lightning Talk presentations into Universal Spanish, French, and Hindi.
Below is a description of the processes used throughout this initiative. We share these materials in the hope that our insights will be of use to other groups interested in digitizing their materials in the spirit of accessible digital design.
Background
The COVID Information Commons (CIC) is an open resource for exploring research on the COVID-19 pandemic and offers an open portal of over 13,800 NSF- and NIH-funded research projects and community events to enable researcher collaboration.
In July 2020, the COVID Information Commons (CIC) began hosting monthly webinars for NSF- and NIH-funded researchers to present their COVID-19 research to a community of interested professionals and students from around the world. The researchers have shared their insights on all aspects of the COVID-19 pandemic, ranging from epidemiology to educational impacts and healthcare outcomes. Presentations are formatted as short lightning talks and followed by open Q&A sessions with audience members. The CIC hosted 148 presentations from July 2020 through December 2024 in 31 webinars, reaching over 15,300 audience members via live events, the NEBDHub’s YouTube Channel. Individual Lightning Talks, and in the Academic Commons.
In 2021, the CIC Project Team began the process of transcribing the presentations and providing written summaries of the events. First, we transcribed the CIC lightning talk videos into written English to enable the hard of hearing to better access the content. Second, we leveraged Adobe Acrobat functions to modify the transcripts to meet accessibility standards suitable for individuals who require screen readers. The English text was then uploaded to the back end of YouTube to replace the system’s auto-generated captions with the Team’s scientific and manually vetted language.
Once an accurate English transcription had been completed, the Project Team then translated the text into written Spanish, Hindi, and French, reflecting the broad demographic trends in the NEBDHub community. The written translations were likewise brought into alignment with Adobe Acrobat’s accessibility standards so as to be suitable for individuals who require screen readers. Uploading non-English text to the back end of YouTube proved an initial challenge, as YouTube cannot generate timestamps for captions in languages other than the original spoken video language. Our Team created a specialized work-around process to accommodate this technological limitation, as described below.
Finally, the Columbia University Libraries Open Scholarship Team reached out to each of the researchers for permission to post their presentations through the Academic Commons digital repository. When permission was granted, a unique DOI (Digital Object Identifier) was generated for each English, French, Hindi, and Spanish transcription or translation. Library staff also advised on certain aspects of the project (e.g., suggesting that translation start with the time-encoded caption files, and then be reflected in narrative transcript documents). Through this collaboration, our teams found new ways to reach broader scientific audiences, articulate our processes, and amplify the work of hundreds of researchers, whose presentations can be cited and referenced in academic publications in perpetuity (their work will also be indexed by scholarly research aggregators such as Google Scholar and OpenAlex).
Process
The CIC Project Team is pleased to share the details of the processes used for this initiative.
After each CIC virtual webinar, the CIC team spliced the 60-90 minute webinar recording into individual COVID researcher lightning talks, each at approximately 10-15 minutes in length. Each lightning talk was then added to the Project Tracker spreadsheet (Sample Project Tracker) and the initial transcription and translation responsibilities were assigned to CIC team members. The English transcribers began working directly in the back end of the NEBDHub YouTube account, refining the auto-generated captions (scientific terminology is often inaccurate in auto-transcribed texts). When the English captions were correctly updated, the text was transferred to a shareable word document (Sample Word Document Template) for formatting. This document was then shared with a second team member, who reviewed the content for accuracy and made appropriate edits. The corrected document was then exported as a .pdf and refined using Adobe Acrobat’s built-in accessibility features. The final English transcription was published to the CIC website for public view (example).
Next, the English text was shared with the French-, Hindi-, and Spanish-language translation teams. The translators used a combination of digital tools (such as Reverso) to establish a baseline text that was ready for refinement. The translations were further edited with particular focus on accurate translations of the technical terminology. Once completed, the translations were also posted to the CIC website. In order to upload the translations to the back end of YouTube as time stamped captions (.vtt files), the text had to be run through a specially-built editor the Project Team created in Excel (Sample Timestamp Generation Template).
An overview of the process is shared here, including a timeline for securing permissions from webinar speakers and a description of Adobe’s accessibility features.
After the final videos and translations were posted to the CIC website, NEBDHub staff worked with the Libraries team to deliver the materials on a schedule, in a structured format, for inclusion in Academic Commons. This process included requesting signed Academic Author agreements from all speakers. English, French, Hindi, and Spanish records for each talk (for which the author has given approval) were then published to Academic Commons on a rolling basis, and these records include the lightning talk videos, transcripts, and captions.
Results
As a result of the CIC Project Team’s efforts, all 148 Lightning Talks from July 2020 through December 2024 have been transcribed into written English and translated into Universal Spanish, French, and Hindi. Over 480 unique DOIs have been generated from the resulting documentation and shared with the PIs for grant reporting purposes, further dissemination through their ORCID (Open Researcher and Contributor IDentifier), and other mechanisms.
As an indication of this initiative’s value, in 2024, 26.5% of all video views on the NEBDHub’s YouTube channel made use of English, French, Hindi, or Spanish subtitles. The 480 unique documents in the Academic Commons have been downloaded 18,291 times as of February 2025, and their records have been viewed 28,074 times.
Resources
Suggested Process for Event Transcription & Translation
Template Transcription Project Tracking Document
Text Editor – Timestamp / VTT Generator
Six student employees of the NEBDHub, two student employees of the Columbia University Libraries, and six REAL Volunteers have contributed to this initiative, providing transcription edits and translation support. We would like to thank Lara Azar, Elia Bregman, Brian Buckley, Tushar Bura, Karem Coca, Cora Lee Cole, Yonara Anastacio Cubas, Paramveer Singh Dahiya, Victoria Horrocks, Shikhar Johri, Sstuti Mehra, Julie Meunier, Aditya Raj, Saanya Subasinghe, Sarai Vega and Rhyley Vaughan for their project support. This has been a truly global initiative and the benefits of our accessibility-focused efforts are already being realized.
One thought on “The COVID Information Commons & Columbia University Libraries – using translation & transcription to increase accessibility to NSF-funded research”-
Pingback: Day in Review (May 5–8) — Association of Research Libraries