Nandi Schoots
  • Home
  • Thesis
  • Thesis
  • Publications
    • Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks
    • The Propensity for Density in Feed-forward Models
    • Dissecting Language Models: Machine Unlearning via Selective Pruning
    • Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs
    • Extending Activation Steering to Broad Skills and Multiple Behaviours
    • Improving Activation Steering in Language Models with Mean-Centring
    • Any Deep ReLU Network is Shallow
    • Finding Sparse Initialisations using Neuroevolutionary Ticket Search (NeTS)
  • Recent & Upcoming Talks
  • Teaching
    • Learn JavaScript
    • Learn Python

Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs

May 30, 2024·
Yohan Mathew
,
Ollie Matthews
,
Robert McCarthy
,
Joan Velja
,
Christian Schroeder De Witt
,
Dylan Cope
,
Nandi Schoots
· 0 min read
PDF
Last updated on Jul 21, 2025

← Dissecting Language Models: Machine Unlearning via Selective Pruning Jun 30, 2024
Extending Activation Steering to Broad Skills and Multiple Behaviours Aug 26, 2023 →

© 2025 Nandi Schoots. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.