Physics of Language Models
Physics of Language Models

现代大语言模型在各种任务中展现了卓越的性能,但其成功背后的工作原理仍未被充分揭示。理解这些模型的内在机制有助于加深我们对大模型的理解。

Safe and Trustworthy AI
Safe and Trustworthy AI

概述基于大语言模型的应用所面临的安全挑战、主要研究方向及相关工作。

Jailbreaking Large Language Models -- Attacks and Defenses
Jailbreaking Large Language Models -- Attacks and Defenses

We review common defensive approaches in both industry and research domains, and discuss a new...