Hot Topics in Security of Machine Learning

Overview

Semester	Winter 2024
Course type	Block Seminar
Lecturer	TT.-Prof. Dr. Wressnegger
Audience	Informatik Master & Bachelor
Credits	4 ECTS
Room	148, Building 50.34
Language	English
Link	TBA
Registration	https://ilias.studium.kit.edu/goto.php?target=crs_2483964&client_id=produktiv

Description

This seminar is concerned with different aspects of adversarial machine learning. Next to the use of machine learning for security, also the security of machine learning algorithms is essential in practice. For a long time, machine learning has not considered worst-case scenarios and corner cases as those exploited by an adversarial nowadays.

The module introduces students to the recently extremely active field of attacks against machine learning and teaches them to work up results from recent research. To this end, the students will read up on a sub-field, prepare a seminar report, and present their work at the end of the term to their colleagues.

Topics include but are not limited to adversarial examples, model stealing, and membership inferences against large language models or text-to-image generative models.

Schedule

Date	Step
Wed, 23. Oct, 9:45–11:15	Kick-off & Topic presentation
Thu, 24. Oct, 11:59 (noon)	Send topic selection (assignment happens till 15:00)
Fri, 25. Oct, 11:59 (noon)	Officially register for assigned topic (missed opportunities will be reassigned to waiting list till 15:00)
Tue, 29. Oct, 9:45–11:15	Optional unit on "How to Ace the Seminar"
Thu, 31. Oct	Arrange appointments with assistant
Mon, 04. Nov - Tue, 05. Nov	1st individual meeting (Provide first overview and ToC)
Mon, 16. Dec - Tue, 17. Dec	2nd individual meeting (Feedback on draft report)
Tue, 07. Jan	Submit final paper
Tue, 14. Jan	Submit review for fellow students
Wed, 15. Jan, 14:00–16:00	PC discussion meeting
Tue, 28. Jan	Submit camera-ready version of your paper
Tue, 11. Feb	Presentation at final colloquium

Matrix Chat

News about the seminar, potential updates to the schedule, and additional material are distributed using the course's matrix room. Moreover, matrix enables students to discuss topics and solution approaches.

You find the link to the matrix room on ILIAS.

Topics

Every student may choose one of the following topics. For each of these, we additionally provide recent top-tier publications that serve as the basis for the seminar report. For the seminar and your final report, you should not merely summarize these papers, but try to go beyond and arrive at your own conclusions.

Model Extraction Attacks against Large Language Models
Model extraction attacks (MEAs) on large language models (LLMs) have received increasing research attention lately. However, most existing methods inherit the extraction strategies from those designed for DNNs but yet neglect the inconsistency of training tasks. This topic will investigate and taxonomize existing extraction methods for LLMs and further discuss their limitations.
IP Protection for Large Language Models
In contrast to stealing LLMs, IP protection methods have recieved considerable attention recently. Those methods can effectively certify the output of a representative set designed by adding watermarked tokens. This topic will investigate and taxonomize existing IP protection methods for LLMs and further discuss limitations and future directions.
Prompt Extraction Attacks against Large Language Models
Companies develop specialized prompts to instruct their LLMs for specific commercial applications. While these system prompts are typically treated as secrets, withheld from end-users, recent research has revealed the risk of potential leaks of system prompts. This topic will investigate existing methods and discuss current limitations and future directions.
Prompt Injection Attacks against Large Language Models
Prompt Injection attack aims to override original instructions and inject harmful commands to cause inappropriate behaviors in the model. This topic will investigate and taxonomize existing methods for LLMs and further discuss limitations and future directions.
Jailbreak Attacks against Large Language Models
Jailbreak aims to exploit LLM vulnerabilities to bypass alignment, leading to harmful or malicious outputs. This topic will investigate and taxonomize existing jailbreak attacks and discuss their limitations and future directions.
Training Data Extractions Attacks against Large Language Models
Training Data Extraction Attack has the potential to extract sensitive information (e.g., passwords) in LLMs. This topic will investigate and taxonomize existing methods and further discuss limitations and future directions.
Adversarial Examples on Text-to-Image Models
Recent Text-to-Image Models enable customized creation of visual content, raising concerns about the robustness of those models (e.g., create attacks targeting specific image generation). This topic will investigate, taxonomize existing methods, and further discuss current limitations and future directions.
Membership Inference Attacks against Text-to-Image Models
Given the remarkable performance of Text-to-Image Models, there's been attention on MIA in those models. This topic will investigate and taxonomize existing MIAs on GANs, VAEs, and diffusion models, and further discuss limitations and future directions.