-
-
Exploring and mitigating adversarial manipulation of voting-based leaderboards
[arXiv]
-
Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A. Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Ziyu Liu, Ion Stoica, Florian Tramèr and others
ICML 2025
(Oral Presentation)
-
-
-
-
-
-
-
-
-
-
-
-
-
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
[arXiv]
-
Edoardo Debenedetti, Javier Rando, Daniel Paleka, Silaghi Fineas Florin, Dragos Albastroiu, Niv Cohen, Yuval Lemberg, Reshmi Ghosh, Rui Wen, Ahmed Salem, Giovanni Cherubin, Santiago Zanella-Beguelin and others
NeurIPS 2024
(Spotlight Presentation)
-
-
-
-
-
-
-
-
-
Stealing Part of a Production Language Model
[arXiv]
-
Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick and others
ICML 2024
(Best Paper Award)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Extracting Training Data from Large Language Models
[arXiv]
-
Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea and Colin Raffel
USENIX Security 2021
(Caspar Bowden Award for Outstanding Research in Privacy Enhancing Technologies runner-up)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
|
-
-
Design Patterns for Securing LLM Agents against Prompt Injections
[arXiv]
-
Luca Beurer-Kellner, Beat Buesser Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr and others
-
-
-
-
-
-
-
-
-
On the Opportunities and Risks of Foundation Models
[arXiv]
-
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch and others
-
|