Deux prix lors du AI Testing Hackathon !
EffiSciences, en partenariat avec Apart Research, a organisé un hackathon dédié à l'interprétabilité IA à l’ENS Ulm, offrant aux étudiants l'opportunité de creuser ce sujet lors d'un week-end intensif. Les participants, guidés par des experts, ont mené des expériences sur des problématiques d'éthique et la détection de Trojan dans les réseaux transformers, et ont ensuite rédigé un rapport. Les équipes d'EffiSciences ont brillé lors de ce hackathon international, décrochant la première et la quatrième places avec des travaux innovants sur l'interprétabilité.
Date de publication
Date de dernière modification :
écrit par :

Discovering Latent Knowledge in Language Models Without Supervision - extensions and testing

By Agatha Duzan, Matthieu David, Jonathan Claybrough

Abstract: Based on the paper "Discovering Latent Knowledge in Language Models without Supervision" this project discusses how well the proposed method applies to the concept of ambiguity. 

To do that, we tested the Contrast Consistent Search method on a dataset which contained both clear cut (0-1) and ambiguous (0,5) examples: We chose the ETHICS-commonsense dataset.

The global conclusion is that the CCS approach seems to generalize well in ambiguous situations, and could potentially be used to determine a model’s latent knowledge about other concepts.

These figures show how the CCS results for last layer activations splits into two groups for the non-ambiguous training samples while the ambiguous test samples on the ETHICS dataset reveals the same ambiguity of latent knowledge by the flattened Gaussian inference probability distribution.

Haydn & Esben’s judging comment: This project is very good in investigating the generality of unsupervised latent knowledge learning. It also seems quite useful as a direct test of how easy it is to extract latent knowledge and provides an avenue towards a benchmark using the ETHICS unambiguous/ambiguous examples dataset. Excited to see this work continue!

Read the report and the code (needs updating).

Trojan detection and implementation on transformers

By Clément Dumas, Charbel-Raphaël Segerie, Liam Imadache

Abstract: Neural Trojans are one of the most common adversarial attacks out there. Even though they have been extensively studied in computer vision, they can also easily target LLMs and transformer based architecture. Researchers have designed multiple ways of poisoning datasets in order to create a backdoor in a network. Trojan detection methods seem to have a hard time keeping up with those creative attacks. Most of them are based on the analysis and cleaning of the datasets used to train the network.

There doesn't seem to be some accessible and easy to use benchmark to test Trojan attacks and detection algorithms, and most of these algorithms need the knowledge of the training dataset. 

We therefore decided to create a small benchmark of trojan networks that we implemented ourselves based on the literature, and use it to test some existing and new detection techniques.

[from the authors]: The colab contains the code to create the trojans described below, but you will also find some mysterious networks containing trojans that you can try to detect and explain. We will provide 50 euros for the first one who will be able to propose a method to find our trigger! 

Haydn & Esben’s judging comment: Great to see so many replications of papers in one project and a nice investigation into Trojan triggers in training data. The proposed use of Task Vectors is quite interesting and your conclusion about Trojan attacks >> defenses is a good observation.

Read the report and the Colab. Check out the created Trojans (if you dare).

retourEN savoir plus