Understanding and Improving the Safety of Frontier Models
Abstract: (The talk will be self-contained and no background on LLM Safety/Alignment is required.)Â This talk provides a foundational overview of recent efforts in industry and academia to improve the safety of frontier models, along with open challenges. It will cover (1) principal approaches to designing red-teaming attacks, (2) in-model and out-of-model methods for enhancing safety, and (3) if time permits, the challenge of catastrophic forgetting in post-training and approaches to continual learning. Bio:Â Hamed Hassani is currently a senior research scientist…



