Karina Nguyen

I work on alignment capabilities and honesty research at Anthropic , reducing hallucinations in large language models, training and evaluating models with novel capabilities, and conducting AI safety research. Most recently I led Claude Instant 1.2 training and productionized the model in the API. Previously, as a design engineer I collaborated on R&D prototypes, journalism tools, and product features with teams at Primer.ai, Dropbox, Square, and the New York Times.

Main Publications

Towards Measuring the Representation of Subjective Global Opinions in Language Models

Esin Durmus, Karina Nguyen, Thomas I Liao, Nicholas Schiefer, +11 more, Jared Kaplan, Jack Clark, Deep Ganguli

We develop a method to test global opinions represented in language models.

In submission 2023

Discovering Language Model Behaviors with Model-Written Evaluations

Ethan Perez, Sam Ringer*, Kamile Lukošiute*, Karina Nguyen*, Edwin Chen, Scott Heiner, +55 more, Nicholas Schiefer, Jared Kaplan

We test LMs using >150 LM-written evaluations, finding cases of inverse scaling, where models exhibit sycophantic behaviors.

ACL'23 (Findings)

FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling

Wei-Yin Ko, Daniel D'souza, Karina Nguyen, Randall Balestriero, Sara Hooker

We find compelling and powerful gains in worst-k and minority group performance, i.e. fairness naturally emerges from ensembling.

In submission 2023

Towards Semantically-Aware UI Design Tools: Design, Implementation and Evaluation of Semantic Grouping Guidelines

Peitong Duan, Bjorn Hartmann, Karina Nguyen, Yang Li, Marti Hearst, Meredith Ringel Morris

We develop computational metric to measure semantic grouping UI violations.

To appear at ICML Workshop'23

Investigations

My work in visual investigative journalism & human rights involved extensive data collection, evidence verification, satellite analysis, 3D reconstructions, legal submissions, investigative tools, and applied remote sensing:

Say Hi!