Exposing biases, moods, personalities, and abstract concepts hidden in large language models
A new method developed at MIT by researchers including IDSS faculty Adityanarayanan “Adit” Radhakrishnan could root out vulnerabilities and improve LLM safety and performance.



