Hot Topics in Security of Machine Learning

Overview

SemesterWinter 2024
Course typeBlock Seminar
LecturerTT.-Prof. Dr. Wressnegger
AudienceInformatik Master & Bachelor
Credits4 ECTS
Room148, Building 50.34
LanguageEnglish
LinkTBA
Registrationhttps://ilias.studium.kit.edu/goto.php?target=crs_2483964&client_id=produktiv

Description

This seminar is concerned with different aspects of adversarial machine learning. Next to the use of machine learning for security, also the security of machine learning algorithms is essential in practice. For a long time, machine learning has not considered worst-case scenarios and corner cases as those exploited by an adversarial nowadays.

The module introduces students to the recently extremely active field of attacks against machine learning and teaches them to work up results from recent research. To this end, the students will read up on a sub-field, prepare a seminar report, and present their work at the end of the term to their colleagues.

Topics include but are not limited to adversarial examples, model stealing, and membership inferences against large language models or text-to-image generative models.

Schedule

DateStep
Wed, 23. Oct, 9:45–11:15Kick-off & Topic presentation
Thu, 24. Oct, 11:59 (noon)Send topic selection
(assignment happens till 15:00)
Fri, 25. Oct, 11:59 (noon)Officially register for assigned topic
(missed opportunities will be reassigned to waiting list till 15:00)
Tue, 29. Oct, 9:45–11:15Optional unit on "How to Ace the Seminar"
Thu, 31. OctArrange appointments with assistant
Mon, 04. Nov - Tue, 05. Nov1st individual meeting (Provide first overview and ToC)
Mon, 16. Dec - Tue, 17. Dec2nd individual meeting (Feedback on draft report)
Tue, 07. JanSubmit final paper
Tue, 14. JanSubmit review for fellow students
Wed, 15. Jan, 14:00–16:00PC discussion meeting
Tue, 28. JanSubmit camera-ready version of your paper
Tue, 11. FebPresentation at final colloquium

Matrix Chat

News about the seminar, potential updates to the schedule, and additional material are distributed using the course's matrix room. Moreover, matrix enables students to discuss topics and solution approaches.

You find the link to the matrix room on ILIAS.

Topics

Every student may choose one of the following topics. For each of these, we additionally provide recent top-tier publications that serve as the basis for the seminar report. For the seminar and your final report, you should not merely summarize these papers, but try to go beyond and arrive at your own conclusions.

  • Model Extraction Attacks against Large Language Models

    Model extraction attacks (MEAs) on large language models (LLMs) have received increasing research attention lately. However, most existing methods inherit the extraction strategies from those designed for DNNs but yet neglect the inconsistency of training tasks. This topic will investigate and taxonomize existing extraction methods for LLMs and further discuss their limitations.

  • IP Protection for Large Language Models

    In contrast to stealing LLMs, IP protection methods have recieved considerable attention recently. Those methods can effectively certify the output of a representative set designed by adding watermarked tokens. This topic will investigate and taxonomize existing IP protection methods for LLMs and further discuss limitations and future directions.

  • Prompt Extraction Attacks against Large Language Models

    Companies develop specialized prompts to instruct their LLMs for specific commercial applications. While these system prompts are typically treated as secrets, withheld from end-users, recent research has revealed the risk of potential leaks of system prompts. This topic will investigate existing methods and discuss current limitations and future directions.

  • Prompt Injection Attacks against Large Language Models

    Prompt Injection attack aims to override original instructions and inject harmful commands to cause inappropriate behaviors in the model. This topic will investigate and taxonomize existing methods for LLMs and further discuss limitations and future directions.

  • Jailbreak Attacks against Large Language Models

    Jailbreak aims to exploit LLM vulnerabilities to bypass alignment, leading to harmful or malicious outputs. This topic will investigate and taxonomize existing jailbreak attacks and discuss their limitations and future directions.

  • Training Data Extractions Attacks against Large Language Models

    Training Data Extraction Attack has the potential to extract sensitive information (e.g., passwords) in LLMs. This topic will investigate and taxonomize existing methods and further discuss limitations and future directions.

  • Adversarial Examples on Text-to-Image Models

    Recent Text-to-Image Models enable customized creation of visual content, raising concerns about the robustness of those models (e.g., create attacks targeting specific image generation). This topic will investigate, taxonomize existing methods, and further discuss current limitations and future directions.

  • Membership Inference Attacks against Text-to-Image Models

    Given the remarkable performance of Text-to-Image Models, there's been attention on MIA in those models. This topic will investigate and taxonomize existing MIAs on GANs, VAEs, and diffusion models, and further discuss limitations and future directions.