Papers

View all on Google Scholar ↗ Code on GitHub ↗

Filter:

✨ Preprints

The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li, Taiwei Shi, Nicholas Meade, Siva Reddy, Jian Kang, Jieyu Zhao

Computer-use agent safety usually screens prompts. OS-BLIND shows even benign instructions can produce harm via task context or execution; safety alignment only activates in the first few steps, so even Claude 4.5 Sonnet hits 92.7% attack success.

Paperagentstrustworthiness

BIASINSPECTOR: Detecting Bias in Structured Data through LLM Agents

Haoxuan Li, Mingyu Derek Ma, Jen-tse Huang, Zhaotian Weng, Wei Wang, Jieyu Zhao

An end-to-end multi-agent framework that plans, tools-up, and detects bias in structured datasets automatically — instead of relying on hand-coded, case-by-case bias checks.

Papertrustworthinessagents

Controllable Pareto Trade-off between Fairness and Accuracy

Yongkang Du, Jieyu Zhao, Yijun Yang, Tianyi Zhou

Fairness-accuracy trade-offs usually hand you one operating point. CPT lets users specify a preferred balance via reference vectors, navigating the Pareto front with stabilized fairness updates and gradient pruning.

Papertrustworthiness

Can LLMs Grasp Implicit Cultural Values? Benchmarking LLMs' Metacognitive Cultural Intelligence with CQ-Bench

Ziyi Liu, Priyanka Dey, Zhenyu Zhao, Jen-tse Huang, Rahul Gupta, Yang Liu, Jieyu Zhao

Cultural-intelligence benchmark that tests whether LLMs can infer the *implicit* values baked into natural conversation — not just the cultural norms stated outright.

Paperevaluationtrustworthiness

FairCode: Evaluating Social Bias of LLMs in Code Generation

Yongkang Du, Jen-tse Huang, Jieyu Zhao, Lu Lin

Code-generation LLMs leak social bias into the code they write. FairCode measures how, where, and which prompts make it worse.

Papertrustworthinessevaluation

SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models

Zixiang Xu, Yanbo Wang, Yue Huang, Jiayi Ye, Haomin Zhuang, Zirui Song, Lang Gao, Chenxi Wang, Zhaorun Chen, Yujun Zhou, et al.

Benchmark for LLM social reasoning built from structured multi-party scenarios — instead of static QA, agents have to navigate inferences about other agents.

Paperevaluationagents

Video-Based Reward Modeling for Computer-Use Agents

Linxin Song, Jieyu Zhang, Huanxin Sheng, Taiwei Shi, Gupta Rahul, Yang Liu, Ranjay Krishna, Kang Jian, Jieyu Zhao

Computer-use agents leave a video trail of their work. We turn that trail into reward signal — rating trajectories at the pixel level instead of grading only the final outcome.

Paper Codeagents

Political-LLM: Large Language Models in Political Science

Lincan Li, Jiaqi Li, Catherine Chen, Fred Gui, Hongjia Yang, Chenxiao Yu, Zhengguang Wang, Jianing Cai, Junlong Aaron Zhou, Bolin Shen, et al.

Survey of how LLMs are being applied in political science — what they can credibly contribute, and the methodological pitfalls researchers keep stepping into.

Paperevaluationtrustworthiness

Disinfomeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation

Jingnong Qu, Liunian Harold Li, Jieyu Zhao, Sunipa Dev, Kai-Wei Chang

Multimodal dataset for detecting memes designed to spread disinformation — image + text, intentionality-labeled, harder than ordinary fake-news detection.

Papertrustworthiness

2026

Detecting and Filtering Unsafe Training Data via Data Attribution with Denoised Representation

ICML 2026

Yijun Pan, Taiwei Shi, Jieyu Zhao, Jiaqi W. Ma

Find which training examples are pushing a model toward unsafe behavior, by attributing model outputs back through denoised representations — and filter them before they do damage.

Papertrustworthinessalignment

ProMediate: A Socio-cognitive Framework for Evaluating Proactive Agents in Multi-party Negotiation

ACL 2026 Findings

Ziyi Liu, Bahar Sarrafzadeh, Pei Zhou, Longqi Yang, Jieyu Zhao, Ashish Sharma

Most agent benchmarks are single-turn task completion. ProMediate puts agents in the middle of multi-party negotiations and asks whether they can read the room, mediate, and act proactively.

Paperagents

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

ACL 2026

Taiwei Shi, Zhuoer Wang, Longqi Yang, Ying-Chun Lin, Zexue He, Mengting Wan, Pei Zhou, Sujay Jauhar, Sihao Chen, Shan Xia, Hongfei Zhang, Jieyu Zhao, Xiaofeng Xu, Xia Song, Jennifer Neville

Align LLMs with the messy, in-the-moment feedback users actually leave during real interactions — not the curated preference pairs that look clean on paper.

Paperalignment

Experiential Reinforcement Learning

Lifelong Agent Workshop @ ICLR 2026

Taiwei Shi, Sihao Chen, Bowen Jiang, Linxin Song, Longqi Yang, Jieyu Zhao

Treat past episodes as a retrievable experience bank rather than gradient updates only — letting the policy reason about what worked last time before deciding what to try next.

Paper Codealignment

CoAct-1: Computer-using Agents with Coding as Actions

ICLR 2026

Linxin Song, Yutong Dai, Viraj Prabhu, Jieyu Zhang, Taiwei Shi, Li Li, Junnan Li, Silvio Savarese, Zeyuan Chen, Jieyu Zhao, Ran Xu, Caiming Xiong

A computer-using agent that doesn't just point and click — it writes code as a first-class action, finishing tasks faster and with higher success than GUI-only agents.

Paper Code Website VentureBeatagents

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

ICLR 2026

Yue Huang, Chujie Gao, ..., Taiwei Shi, ..., Jieyu Zhao, ..., Xiangliang Zhang

A community guideline plus working assessment toolkit for evaluating trustworthiness in generative foundation models across modalities — what to measure, how, and why each axis matters.

Paper Codetrustworthinessevaluation

What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning

EACL 2026

Zhaotian Weng, Haoxuan Li, Xin Eric Wang, Kuan-Hao Huang, Jieyu Zhao

VLMs can recognize what happens in a sequence of frames but fail when the task requires reasoning about *why* in causal order — we probe where the gap is.

Paperevaluationtrustworthiness

GRAVITY: A Framework for Personalized Text Generation via Profile-Grounded Synthetic Preferences

EACL 2026

Priyanka Dey, Daniele Rosa, Wenqing Zheng, Daniel Barcklow, Jieyu Zhao, Emilio Ferrara

Personalized generation usually needs costly human preference data. GRAVITY synthesizes preference pairs grounded in user profiles (cultural dimensions, values, personality), and its outputs beat baselines 86% of the time on Amazon book descriptions.

Paperevaluationalignment

Enhancing Diversity in Text-to-Image Generation Without Compromising Fidelity

TMLR

Jiazhi Li, Mi Zhou, Mahyar Khayatkhoei, Jingyu Shi, Xiang Gao, Jiageng Zhu, Hanchen Xie, Xiyun Song, Zongfang Lin, Heather Yu, Liang Peng, Jieyu Zhao

Text-to-image models trade diversity for fidelity by default. We show you can recover demographic and conceptual diversity without sacrificing how good the images look.

trustworthiness

Efficient Reinforcement Finetuning via Adaptive Curriculum Learning

TMLR

Taiwei Shi, Yiyang Wu, Linxin Song, Tianyi Zhou, Jieyu Zhao

RL finetuning wastes compute on prompts the model already aces. We schedule training data adaptively — pushing the model toward problems just past its current capability.

Paper Codealignment

2025

Multilingual Large Language Models Leak Human Stereotypes Across Language Boundaries

NLP4PI 2025

Yang Trista Cao, Anna Sotnikova, Jieyu Zhao, Linda X. Zou, Rachel Rudinger, Hal Daumé III

Stereotypes encoded in one language's training data don't stay there — multilingual LLMs leak them across language boundaries, sometimes amplifying biases in languages they barely saw.

Papertrustworthiness

DrugAgent: Automating AI-Aided Drug Discovery Programming Through LLM Multi-Agent Collaboration

AI4Research @ AAAI 2025

Sizhe Liu, Yizhou Lu, Siyu Chen, Xiyang Hu, Jieyu Zhao, Yingzhou Lu, Yue Zhao

A multi-agent framework that automates ML programming for drug discovery: an LLM Planner pairs with an LLM Instructor to turn high-level ideas into runnable pharmaceutical pipelines.

Paperagents

Cross-lingual Pitfalls: Automatic Probing of Cross-lingual Weakness of Multilingual Large Language Models

ACL 2025

Zixiang Xu, Yanbo Wang, Yue Huang, Xiuying Chen, Jieyu Zhao, Meng Jiang, Xiangliang Zhang

Automatic probing surfaces concrete spots where multilingual LLMs are weaker in some languages than others — beyond what a single aggregate score reveals.

Paperevaluationtrustworthiness

Can LLMs Express Personality Across Cultures? Introducing CulturalPersonas for Evaluating Trait Alignment

EMNLP 2025 Findings

Priyanka Dey, Aayush Bothra, Yugal Khanter, Emilio Ferrara, Jieyu Zhao

Personality questionnaires translate poorly across cultures. CulturalPersonas tests whether LLMs can express Big-Five traits in culturally appropriate ways instead of defaulting to a WEIRD baseline.

Paper Codeevaluationtrustworthiness

AI Sees Your Location — But With a Bias Toward the Wealthy World

EMNLP 2025

Jingyuan Huang, Jen-tse Huang, Ziyi Liu, Xiaoyuan Liu, Wenxuan Wang, Jieyu Zhao

Vision-language models guess where photos were taken — and they're substantially more accurate for wealthy regions than for the global south. We quantify the disparity and ask why.

Paper Codetrustworthiness

The Hallucination Tax of Reinforcement Finetuning

EMNLP 2025 Findings

Linxin Song*, Taiwei Shi*, Jieyu Zhao

RL finetuning makes models more decisive — including when they shouldn't be. We measure the hallucination tax and show synthetic unanswerable math problems can pay it back.

Paper HF MarkTechPostalignmenttrustworthiness

Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction

EMNLP 2025

Huanxin Sheng, Xinyi Liu, Hangfeng He, Jieyu Zhao, Jian Kang

LLM judges return point scores that look authoritative. We wrap them in conformal-prediction intervals so you can tell when the judge is actually confident — and when it's bluffing.

🏆 SAC Highlightevaluation

VISBIAS: Measuring Explicit and Implicit Social Biases in Vision Language Models

EMNLP 2025

Jen-tse Huang, Jiantong Qin, Jianping Zhang, Youliang Yuan, Wenxuan Wang, Jieyu Zhao

Vision-language models can pass explicit-bias probes while still leaking the same stereotypes implicitly. VISBIAS tests both axes side by side.

Paper Codetrustworthiness

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

ICLR 2025

Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah Smith, Chiyuan Zhang

Six-axis evaluation for machine unlearning in language models — testing whether claimed unlearning actually erases the knowledge or just hides it from a few probes.

Paperalignmentevaluation

Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base

COLM 2025

Linxin Song, Xuwei Ding, Jieyu Zhang, Taiwei Shi, Ryotaro Shimizu, Rahul Gupta, Yang Liu, Jian Kang, Jieyu Zhao

Where exactly do LLMs not know things? SEA searches a massive KB to surface concrete topical blind spots, instead of reporting a single aggregate accuracy number.

Paper Codeevaluation

2024

CLIMB: A Benchmark of Clinical Bias in Large Language Models

NLP4PI @ EMNLP 2024

Yubo Zhang, Shudi Hou, Mingyu Derek Ma, Wei Wang, Muhao Chen, Jieyu Zhao

Benchmark probing whether LLM medical advice changes with patient demographics — the bias most likely to cause real-world harm if deployed in clinical workflows.

Papertrustworthinessevaluation

"You Gotta be a Doctor, Lin": An Investigation of Name-Based Bias of Large Language Models in Employment Recommendations

EMNLP 2024

Huy Nghiem, John Prindle, Jieyu Zhao, Hal Daumé III

LLMs reading a candidate's name recommend systematically different jobs based on inferred ethnicity — even when qualifications are identical.

Papertrustworthiness

Images Speak Louder Than Words: Understanding and Mitigating Bias in Vision-Language Models from a Causal Mediation Perspective

EMNLP 2024

Zhaotian Weng, Zijun Gao, Jerone Andrews, Jieyu Zhao

Localize and mitigate bias *inside* a vision-language model using causal mediation analysis — instead of treating the model as a black box at the output.

trustworthiness

InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context

EMNLP 2024

Ziyi Liu, Abhishek Anand, Pei Zhou, Jen-tse Huang, Jieyu Zhao

Evaluate LLM social intelligence by putting them inside an interactive game where success depends on inferring other agents' intentions, not on answering static QA.

agentsevaluation

SCORE: A Framework for Self-Contradictory Reasoning Evaluation

EMNLP Findings 2024

Ziyi Liu, Isabelle Lee, Yongkang Du, Soumya Sanyal, Jieyu Zhao

Framework for measuring when an LLM's chain-of-thought contradicts itself — and detecting those contradictions automatically across reasoning benchmarks.

Paperevaluation

Does Differential Privacy Impact Bias in Pretrained Language Models?

IEEE Data Engineering Bulletin 2024

Md Khairul Islam, Andrew Wang, Tianhao Wang, Yangfeng Ji, Judy Fox, Jieyu Zhao

Differential privacy and bias mitigation sometimes conflict. We measure exactly when DP training makes downstream bias worse — and which protected groups bear the cost.

trustworthiness

Fair Abstractive Summarization of Diverse Perspectives

NAACL 2024

Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang

Summarizers compress text — and often compress out minority viewpoints. We measure that loss and propose summaries that preserve the spread of perspectives in the source.

Papertrustworthiness

Safer-Instruct: Aligning Language Models with Automated Preference Data

NAACL 2024 · SeT-LLM 2024

Taiwei Shi, Kai Chen, Jieyu Zhao

Generate the preference data needed for alignment automatically, with guardrails, instead of paying for tens of thousands of human labels.

Paperalignment

Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit Rate

ICML 2024

Yuancheng Xu, Chenghao Deng, Yanchao Sun, Ruijie Zheng, Xiyao Wang, Jieyu Zhao, Furong Huang

Single-step fairness constraints break down when decisions compound over time. We propose Equal Long-term Benefit Rate — fairness evaluated over the trajectory — and bias-mitigation strategies that target it.

Papertrustworthiness

TrustLLM: Trustworthiness in Large Language Models

ICML 2024

Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, ..., Jieyu Zhao, ..., Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao

A consortium-scale benchmark for trustworthiness in LLMs — truthfulness, safety, fairness, robustness, privacy, and ethics — with consistent protocols across 16 mainstream models.

Papertrustworthinessevaluation

2023

Does BERT Exacerbate Gender or L1 Biases in Automated English Speaking Assessment?

BEA 2023 @ ACL

Alexander Kwako, Yixin Wan, Jieyu Zhao, Mark Hansen, Kai-Wei Chang, Li Cai

When BERT is used to score spoken English, does pretraining-era bias make existing gender / L1 disparities in scoring worse?

trustworthiness

A Rose by Any Other Name Would Not Smell as Sweet: Social Bias in Name Mistranslations

EMNLP 2023

Sandra Sandoval, Jieyu Zhao, Marine Carpuat, Hal Daumé III

Machine translation systems mishandle names in systematic ways tied to gender and origin — and the cost lands disproportionately on people whose names don't fit Anglocentric defaults.

trustworthiness

Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems

EMNLP Findings 2023

Yixin Wan, Jieyu Zhao, Aman Chadha, Nanyun Peng, Kai-Wei Chang

Giving a chatbot a persona can amplify the biases it carries. We measure how much, and which personas make things worse.

Papertrustworthiness

Mind What You Measure For: A Study on Reliability of Prompt-Based Bias Measurement

WiNLP 2023

Ruyuan Zuo, Jieyu Zhao

Reword the prompt and the bias number changes. We show how unstable prompt-based bias measurements are, and what to do instead.

trustworthinessevaluation

Auditing Algorithmic Fairness in Machine Learning for Health with Severity-based LOGAN

W3PHIAI 2023

Anaelia Ovalle, Sunipa Dev, Jieyu Zhao, Majid Sarrafzadeh, Kai-Wei Chang

Apply LOGAN-style local-group bias detection to healthcare ML, weighting clusters by the severity of harm — so the audit prioritizes the failures that matter most clinically.

trustworthinessevaluation

SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning Models

EACL 2023

Haozhe An, Zongxia Li, Jieyu Zhao, Rachel Rudinger

Most bias benchmarks pre-specify which axes to test. SODAPOP lets the social-commonsense model itself surface where it falls apart, instead of grading against a hand-built checklist.

Paper Code Video Postertrustworthiness

TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

NeurIPS 2023

Ruijie Zheng, Xiyao Wang, Yanchao Sun, Shuang Ma, Jieyu Zhao, Huazhe Xu, Hal Daumé, Furong Huang

Visual RL agents struggle to learn action-relevant representations from pixels. TACO contrasts state-action pairs across time to pull out what actually moves the world.

Paper Code Websitealignment

2022

On Measures of Biases and Harms in NLP

AACL 2022

Sunipa Dev, Emily Sheng, Jieyu Zhao, Aubrie Amstutz, Jiao Sun, Yu Hou, Mattie Sanseverino, Jiin Kim, Akihiro Nishi, Nanyun Peng, Kai-Wei Chang

An audit of how the NLP community measures bias and harm — what each metric actually captures, what it misses, and where the field has been talking past itself.

Papertrustworthinessevaluation

Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers

EMNLP Findings 2022

Jieyu Zhao, Xuezhi Wang, Yao Qin, Jilin Chen, Kai-Wei Chang

Ensembling can buy robustness — but the gains depend on which kinds of models you mix, not just how many.

Papertrustworthiness

Using Item Response Theory to Measure Gender and Racial Bias of a BERT-based Automated English Speech Assessment System

BEA 2022 @ ACL

Alexander Kwako, Yixin Wan, Jieyu Zhao, Kai-Wei Chang, Li Cai, Mark Hansen

Use item response theory — borrowed from psychometrics — to measure gender and racial bias of BERT-based English speech assessment, beyond simple accuracy gaps.

trustworthinessevaluation

2021

Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation

NAACL 2021

Chong Zhang, Jieyu Zhao, Huan Zhang, Kai-Wei Chang, Cho-Jui Hsieh

Counterfactual bias evaluations themselves aren't robust — a small perturbation to the test set can flip the conclusion. We diagnose when and offer a fix.

Papertrustworthinessevaluation

Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?

ACL Findings 2021

Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Kai-Wei Chang

If you literally tell a language model to be fair, does it listen? We test natural-language ethical interventions and find that 'taking advice' is harder than it looks.

Papertrustworthinessalignment

2020

Fairness-Aware Explainable Recommendation Over Knowledge Graphs

SIGIR 2020

Zuohui Fu*, Yikun Xian*, Ruoyuan Gao, Jieyu Zhao, Qiaoying Huang, Yingqiang Ge, Shuyuan Xu, Shijie Geng, Chirag Shah, Yongfeng Zhang, Gerard de Melo

Explainable recommendation can quietly bake in unfair preferences. We modify the KG-walking objective so the explanation is fair, not just plausible.

Papertrustworthiness

LOGAN: Local Group Bias Detection by Clustering

EMNLP 2020

Jieyu Zhao, Kai-Wei Chang

Average-case fairness numbers hide local pockets of severe bias. LOGAN clusters the data to find where a model fails worst, instead of just reporting the mean.

Papertrustworthinessevaluation

Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

ACL 2020

Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang, Ahmed Hassan Awadallah

Cross-lingual transfer doesn't just move task signal — it moves bias. We trace how gender bias travels across languages in multilingual embeddings.

Papertrustworthiness

Mitigating Gender Bias Amplification in Distribution by Posterior Regularization

ACL 2020

Shengyu Jia*, Tao Meng*, Jieyu Zhao, Kai-Wei Chang

Models don't just inherit bias from data — they amplify it. Posterior regularization constrains the predicted distribution to stay close to the empirical one, blunting amplification.

trustworthiness

"The Boating Store Had Its Best Sail Ever": Pronunciation-Attentive Contextualized Pun Recognition

ACL 2020

Yichao Zhou, Jyun-Yu Jiang, Jieyu Zhao, Kai-Wei Chang, Wei Wang

Puns hinge on sound, not just meaning. A pronunciation-attentive model recognizes them by attending to phonetic neighbors alongside contextualized text.

Paperevaluation

Towards Understanding Gender Bias in Relation Extraction

ACL 2020

Andrew Gaut, Tony Sun, Shirlyn Tang, Yuxin Huang, Jing Qian, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, William Yang Wang

Relation extractors learn that some relations belong to some genders — a problem when the same model is then used to build knowledge graphs.

trustworthiness

2019

Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations

ICCV 2019

Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, Vicente Ordonez

Even with a 50/50 dataset, image models still latch onto gendered cues — because the visual features themselves are correlated with gender. Balancing labels isn't enough.

Papertrustworthiness

Gender Bias in Contextualized Word Embeddings

NAACL 2019

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ryan Cotterell, Vicente Ordonez, Kai-Wei Chang

ELMo and friends were supposed to fix the bias problems of static embeddings. They didn't — contextualized vectors carry their own, sometimes worse, gendered geometry.

Paper Video Slidestrustworthiness

Mitigating Gender Bias in Natural Language Processing: Literature Review

ACL 2019

Tony Sun, Andrew Gaut, Shirlyn Tang, Yuxin Huang, Mai ElSherief, Jieyu Zhao, Diba Mirza, Elizabeth Belding, Kai-Wei Chang, William Yang Wang

A field-defining survey of how gender bias enters NLP systems, how it's measured, and what mitigation actually works versus what just looks like it does.

Papertrustworthiness

Examining Gender Bias in Languages with Grammatical Gender

EMNLP 2019

Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao Huang, Muhao Chen, Ryan Cotterell, Kai-Wei Chang

English-trained bias methods don't transfer cleanly to languages with grammatical gender. We adapt the measurement and find the bias is still there — just shaped differently.

Papertrustworthiness

2018

Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

NAACL 2018

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang

WinoBias: a coref benchmark where the gendered pronoun and the stereotypical answer disagree. State-of-the-art systems flip their answers, and we show how to push back.

🏆 Best Poster @ SoCalNLP · Top-10 Cited @ NAACL 2018Paper Code Podcasttrustworthinessevaluation

Learning Gender-Neutral Word Embeddings

EMNLP 2018

Jieyu Zhao, Yichao Zhou, Zeyu Li, Wei Wang, Kai-Wei Chang

GN-GloVe: train word embeddings whose gender dimension is isolated by construction, so downstream models can use the semantics without inheriting the gender association.

Paper Codetrustworthiness

2017

Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-level Constraints

EMNLP 2017

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, Kai-Wei Chang

Vision-language models don't just learn that men cook less than women — they amplify the gap. We add corpus-level constraints at inference time to keep model statistics aligned with training statistics.

🏆 Best Long Paper AwardPaper Code Wired: Machines Taught By Photos Learn a Sexist View of Womentrustworthiness