Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li, Taiwei Shi, Nicholas Meade, Siva Reddy, Jian Kang, Jieyu Zhao
Computer-use agent safety usually screens prompts. OS-BLIND shows even benign instructions can produce harm via task context or execution; safety alignment only activates in the first few steps, so even Claude 4.5 Sonnet hits 92.7% attack success.

