Semester | Winter 2024 |
Course type | Block Seminar |
Lecturer | TT.-Prof. Dr. Wressnegger |
Audience | Informatik Master & Bachelor |
Credits | 4 ECTS |
Room | 148, Building 50.34 |
Language | English |
Link | TBA |
Registration | https://ilias.studium.kit.edu/goto.php?target=crs_2483964&client_id=produktiv |
This seminar is concerned with different aspects of adversarial machine learning. Next to the use of machine learning for security, also the security of machine learning algorithms is essential in practice. For a long time, machine learning has not considered worst-case scenarios and corner cases as those exploited by an adversarial nowadays.
The module introduces students to the recently extremely active field of attacks against machine learning and teaches them to work up results from recent research. To this end, the students will read up on a sub-field, prepare a seminar report, and present their work at the end of the term to their colleagues.
Topics include but are not limited to adversarial examples, model stealing, and membership inferences against large language models or text-to-image generative models.
Date | Step |
Wed, 23. Oct, 9:45–11:15 | Kick-off & Topic presentation |
Thu, 24. Oct, 11:59 (noon) | Send topic selection(assignment happens till 15:00) |
Fri, 25. Oct, 11:59 (noon) | Officially register for assigned topic (missed opportunities will be reassigned to waiting list till 15:00) |
Tue, 29. Oct, 9:45–11:15 | Optional unit on "How to Ace the Seminar" |
Thu, 31. Oct | Arrange appointments with assistant |
Mon, 04. Nov - Tue, 05. Nov | 1st individual meeting (Provide first overview and ToC) |
Mon, 16. Dec - Tue, 17. Dec | 2nd individual meeting (Feedback on draft report) |
Tue, 07. Jan | Submit final paper |
Tue, 14. Jan | Submit review for fellow students |
Wed, 15. Jan, 14:00–16:00 | PC discussion meeting |
Tue, 28. Jan | Submit camera-ready version of your paper |
Tue, 11. Feb | Presentation at final colloquium |
News about the seminar, potential updates to the schedule, and additional material are distributed using the course's matrix room. Moreover, matrix enables students to discuss topics and solution approaches.
You find the link to the matrix room on ILIAS.
Every student may choose one of the following topics. For each of these, we additionally provide recent top-tier publications that serve as the basis for the seminar report. For the seminar and your final report, you should not merely summarize these papers, but try to go beyond and arrive at your own conclusions.
Model extraction attacks (MEAs) on large language models (LLMs) have received increasing research attention lately. However, most existing methods inherit the extraction strategies from those designed for DNNs but yet neglect the inconsistency of training tasks. This topic will investigate and taxonomize existing extraction methods for LLMs and further discuss their limitations.
In contrast to stealing LLMs, IP protection methods have recieved considerable attention recently. Those methods can effectively certify the output of a representative set designed by adding watermarked tokens. This topic will investigate and taxonomize existing IP protection methods for LLMs and further discuss limitations and future directions.
Companies develop specialized prompts to instruct their LLMs for specific commercial applications. While these system prompts are typically treated as secrets, withheld from end-users, recent research has revealed the risk of potential leaks of system prompts. This topic will investigate existing methods and discuss current limitations and future directions.
Prompt Injection attack aims to override original instructions and inject harmful commands to cause inappropriate behaviors in the model. This topic will investigate and taxonomize existing methods for LLMs and further discuss limitations and future directions.
Jailbreak aims to exploit LLM vulnerabilities to bypass alignment, leading to harmful or malicious outputs. This topic will investigate and taxonomize existing jailbreak attacks and discuss their limitations and future directions.
Training Data Extraction Attack has the potential to extract sensitive information (e.g., passwords) in LLMs. This topic will investigate and taxonomize existing methods and further discuss limitations and future directions.
Recent Text-to-Image Models enable customized creation of visual content, raising concerns about the robustness of those models (e.g., create attacks targeting specific image generation). This topic will investigate, taxonomize existing methods, and further discuss current limitations and future directions.
Given the remarkable performance of Text-to-Image Models, there's been attention on MIA in those models. This topic will investigate and taxonomize existing MIAs on GANs, VAEs, and diffusion models, and further discuss limitations and future directions.