Berkant Turan

I am a third-year PhD candidate at the Institute of Mathematics, TU Berlin and a research associate at Zuse Institute Berlin (ZIB). Under the supervision of Prof. Sebastian Pokutta, I am currently part of the Interactive Optimization and Learning (IOL) research group at ZIB. Additionally, I am a member of the Berlin Mathematical School (BMS), which is part of the Math+ Excellence Cluster.

My research interests focus on the interpretability, robustness, and safe deployment of neural networks in high-stakes applications. I develop interactive, multi-agent models that provide provable insights into the decision-making processes of black-box systems. By utilizing feature selectors within an adversarial framework, I aim to expose the reasoning behind model predictions, addressing core challenges in the interpretability and security of complex AI systems.

Additionally, I am interested in the interconnections between model security approaches, such as adversarial robustness and backdoor-based watermarks, and the theoretical limits tied to these techniques in different learning tasks. My work also investigates transferable attacks, which exploit vulnerabilities across multiple defenses using cryptographic tools, uncovering fundamental links between AI security and cryptography.

Before starting my PhD, I focused on Deep Hybrid Discriminative-Generative Modeling, investigating the optimization and behavior of Variational Autoencoders and Residual Networks for out-of-distribution detection, robustness, and calibration in computer vision tasks.

news

06/2025	Happy to share that Neural Concept Verifier: Scaling Prover-Verifier Games Via Concept Encodings has been accepted at ICML 2025 Workshop on Actionable Interpretability! Grateful to collaborate with S. Asadulla, D. Steinmann, W. Stammer, and S. Pokutta on advancing interpretable AI through formal vericfiation methods.
05/2025	Excited to announce that our collaborative work on Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation has been accepted at ICML 2025! Many thanks to my amazing collaborators J. Pauls, M. Zimmer, S. Saatchi, P. Ciais, S. Pokutta, and F. Gieseke for this interdisciplinary project bridging machine learning and environmental science.
03/2025	Great news! The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses has been accepted at ICLR 2025 Workshop on GenAI Watermarking.
10/2024	Excited to announce that The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses is now available on arXiv! Many thanks to my collaborators, Grzegorz Głuch (EPFL at the time), Sai Ganesh Nagarajan (ZIB) and Sebastian Pokutta (ZIB), for their contributions to this project!
06/2024	Unified Taxonomy of AI Safety: Watermarks, Adversarial Defenses and Transferable Attacks got accepted at ICML 2024 Workshop on Theoretical Foundations of Foundation Models. See you in Vienna!
03/2024	Our recent paper, Interpretability Guarantees with Merlin-Arthur Classifiers, has been accepted at AISTATS 2024. Looking forward to meeting you in Valencia.
07/2023	I received the Best Proposal Award at the xAI-2023 Doctoral Consortium in Lisbon for my research on Extending Merlin-Arthur Classifiers for Improved Interpretability. Thank you to the reviewers and organizers!
09/2022	Excited to have started my PhD at TU Berlin and the Zuse Institute Berlin in the Interactive Optimization and Learning research lab, under the supervision of Sebastian Pokutta.

selected publications

The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses

Grzegorz Głuch, Berkant Turan, Sai Ganesh Nagarajan, and Sebastian Pokutta

arXiv preprint arXiv:2410.08864, 2024

Abs Bib PDF Poster

We formalize and extend existing definitions of backdoor-based watermarks and adversarial defenses as interactive protocols between two players. The existence of these schemes is inherently tied to the learning tasks for which they are designed. Our main result shows that for almost every learning task, at least one of the two – a watermark or an adversarial defense – exists. The term “almost every” indicates that we also identify a third, counterintuitive but necessary option, i.e., a scheme we call a transferable attack. By transferable attack, we refer to an efficient algorithm computing queries that look indistinguishable from the data distribution and fool all efficient defenders. To this end, we prove the necessity of a transferable attack via a construction that uses a cryptographic tool called homomorphic encryption. Furthermore, we show that any task that satisfies our notion of a transferable attack implies a cryptographic primitive, thus requiring the underlying task to be computationally complex. These two facts imply an “equivalence” between the existence of transferable attacks and cryptography. Finally, we show that the class of tasks of bounded VC-dimension has an adversarial defense, and a subclass of them has a watermark.
@article{gluch2024_goodbadugly, title = {The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses}, author = {Głuch, Grzegorz and Turan, Berkant and Nagarajan, Sai Ganesh and Pokutta, Sebastian}, year = {2024}, journal = {arXiv preprint arXiv:2410.08864}, primaryclass = {cs.LG}, }
Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks

Grzegorz Gluch, Sai Ganesh Nagarajan, and Berkant Turan

ICML 2024 Workshop on Theoretical Foundations of Foundation Models (TF2M), 2024

Abs Bib PDF Poster

As AI becomes omnipresent in today’s world, it is crucial to study the safety aspects of learning, such as guaranteed watermarking capabilities and defenses against adversarial attacks. In prior works, these properties were generally studied separately and empirically barring a few exceptions. Meanwhile, strong forms of adversarial attacks that are transferable had been developed (empirically) for discriminative DNNs (Liu et al., 2016) and LLMs (Zou et al., 2023). In this ever-evolving landscape of attacks and defenses, we initiate the formal study of watermarks, defenses, and transferable attacks for classification, under a unified framework, by having two time-bounded players participate in an interactive protocol. Consequently, we show that for every learning task, at least one of the three schemes exists. Importantly, our results cover regimes where VC theory is not necessarily applicable. Finally we provide provable examples of the three schemes and show that transferable attacks exist only in regimes beyond bounded VC dimension. The example we give is a nontrivial construction based on cryptographic tools, i.e. homomorphic encryption.
@article{gluch_taxonomy_2024, title = {Unified {T}axonomy in {AI} {S}afety: {W}atermarks, {A}dversarial {D}efenses, and {T}ransferable {A}ttacks}, author = {Gluch, Grzegorz and Nagarajan, Sai Ganesh and Turan, Berkant}, journal = {ICML 2024 Workshop on Theoretical Foundations of Foundation Models (TF2M)}, year = {2024}, }
Interpretability Guarantees with Merlin-Arthur Classifiers

Stephan Wäldchen, Kartikey Sharma, Berkant Turan, Max Zimmer, and 1 more author

In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2024

Abs Bib PDF Code Poster

We propose an interactive multi-agent classifier that provides provable interpretability guarantees even for complex agents such as neural networks. These guarantees consist of lower bounds on the mutual information between selected features and the classification decision. Our results are inspired by the Merlin-Arthur protocol from Interactive Proof Systems and express these bounds in terms of measurable metrics such as soundness and completeness. Compared to existing interactive setups, we rely neither on optimal agents nor on the assumption that features are distributed independently. Instead, we use the relative strength of the agents as well as the new concept of Asymmetric Feature Correlation which captures the precise kind of correlations that make interpretability guarantees difficult. We evaluate our results on two small-scale datasets where high mutual information can be verified explicitly.
@inproceedings{pmlr-v238-waldchen24a, title = { Interpretability Guarantees with {M}erlin-{A}rthur Classifiers }, author = {W\"{a}ldchen, Stephan and Sharma, Kartikey and Turan, Berkant and Zimmer, Max and Pokutta, Sebastian}, booktitle = {Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS)}, pages = {1963--1971}, year = {PMLR, 2024}, editor = {Dasgupta, Sanjoy and Mandt, Stephan and Li, Yingzhen}, volume = {238}, series = {Proceedings of Machine Learning Research}, publisher = {PMLR}, }
Extending Merlin-Arthur Classifiers for Improved Interpretability

Berkant Turan

In Joint Proceedings of the xAI-2023 Late-breaking Work, Demos and Doctoral Consortium, co-located with the 1st World Conference on eXplainable Artificial Intelligence (xAI-2023), Jul 2023

(Best Proposal Award)

Abs Bib PDF

In my doctoral research, I aim to address the interpretability challenges associated with deep learning by extending the Merlin-Arthur Classifier framework. This novel approach employs a pair of feature selectors, including an adversarial player, to generate informative saliency maps. My research focuses on enhancing the classifier’s performance and exploring its applicability to complex datasets, including a recently established human benchmark for detecting pathologies in X-ray images. Tackling the min-max optimization challenge inherent in the Merlin-Arthur Classifier for high-dimensional data, I will explore and apply diverse stabilization strategies to bolster the framework’s robustness and training stability. Finally, the goal is to expand the framework beyond pixel-level saliency maps to encompass modalities, such as text and learned feature spaces, fostering a comprehensive understanding of interpretability across various domains and data types.
@inproceedings{TuranConsortium2023, title = {Extending Merlin-Arthur Classifiers for Improved Interpretability}, author = {Turan, Berkant}, booktitle = {Joint Proceedings of the xAI-2023 Late-breaking Work, Demos and Doctoral Consortium, co-located with the 1st World Conference on eXplainable Artificial Intelligence (xAI-2023)}, pages = {193-200}, year = {2023}, organization = {Springer}, address = {Lisbon, Portugal}, month = jul, }