Aaron Gokaslan: Pioneering Trustworthy AI with Open Source Innovations and Global Collaboration
April 11, 2025
In 2024, Mozilla recognized Aaron Gokaslan as a leading architect of trustworthy AI for his significant contributions to open source datasets and language models.
In response to concerns about copyright infringement, Gokaslan created Common Canvas, a dataset comprising 100 million Creative Commons photos aimed at facilitating AI-assisted image creation.
This dataset has been downloaded over 4 million times, establishing itself as an essential resource for AI research worldwide, and marking a shift in perception from a novelty to a foundational tool.
Currently a doctoral student at Cornell, Gokaslan advocates for public access to AI tools, arguing that it accelerates research and innovation for everyone.
The release of ChatGPT by OpenAI in 2019 sparked concerns about the risks of publicizing its inner workings, but Gokaslan and his collaborator Vanya Cohen countered this by successfully reverse-engineering the model.
They replicated the datasets that powered ChatGPT using millions of web pages, making their findings publicly available to empower others in developing their own chatbots.
Gokaslan also advises organizations like Encode Justice on ethical AI policies and has developed a customizable license template to prevent the misuse of AI models.
He played a pivotal role in the creation of BLOOM, an AI language model that supports 46 languages, built through international collaboration on a French supercomputer.
In addition, Gokaslan maintains several popular code libraries, including pybind11 and Pytorch, and finds fulfillment in the tangible impact of his open source contributions.
Summary based on 1 source
Get a daily email with more AI stories
Source

Brown Alumni Magazine
AI’s Robin Hood