Karina Nguyen
I work on alignment capabilities and honesty research at Anthropic , reducing hallucinations in large language models, training and evaluating models with novel capabilities, and conducting AI safety research. Most recently I led Claude Instant 1.2 training and productionized the model in the API. Previously, as a design engineer I collaborated on R&D prototypes, journalism tools, and product features with teams at Primer.ai, Dropbox, Square, and the New York Times.
Main Publications
Towards Measuring the Representation of Subjective Global Opinions in Language Models
We develop a method to test global opinions represented in language models.
In submission 2023
Discovering Language Model Behaviors with Model-Written Evaluations
We test LMs using >150 LM-written evaluations, finding cases of inverse scaling, where models exhibit sycophantic behaviors.
ACL'23 (Findings)
FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling
We find compelling and powerful gains in worst-k and minority group performance, i.e. fairness naturally emerges from ensembling.
In submission 2023
Towards Semantically-Aware UI Design Tools: Design, Implementation and Evaluation of Semantic Grouping Guidelines
We develop computational metric to measure semantic grouping UI violations.
To appear at ICML Workshop'23
Investigations
My work in visual investigative journalism & human rights involved extensive data collection, evidence verification, satellite analysis, 3D reconstructions, legal submissions, investigative tools, and applied remote sensing:
- Bloomberg CityLab
- Wired
- New York Times
- Washington Post
- CNN
- Associated Press
- Bellingcat
- SITU Research
- The Atlantic Council
- Amnesty International
Say Hi!