Pole AI

Retour sur les projets du cours Responsible Machine Learning du master MVA

DESCRIPTION

Le pôle IA d'EffiSciences a organisé en collaboration avec le cours Responsible Machine Learning du Master MVA un hackathon portant sur la vérification des réseaux de neurones. Organisé du 16 au 18 décembre 2022, il a constitué la validation technique du cours. Les élèves devaient proposer un travail de recherche divers touchant à l'équité, la robustesse et à la vérifiabilité des réseaux de neurones. Nous avons évalué le travail des élèves, et nous avons évalué de très bonnes choses !

Date de publication

20/6/23

Date de dernière modification :

20/6/23

écrit par :

Voici une sélection de sujets que nous avons évalués :

1. Truthfulness in Latent Knowledge

2. Justice dataset with GPT-3

3. Study of the biases of a generative model through the example of a face inpainting model

4. Transférabilité via des métriques perceptuelles

5. Adversarial Attacks and Defends on MNIST

6. Racial bias when inpainting faces with CelebA denoising diffusion probabilistic models

7. Comparison of the robustness to adversarial SOTA DL audio compression

8. Foundation models and the AI act

9. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

‍

Nous voulons mettre en valeur l’un des rapports, qui présente un travail très élégant en interprétabilité des GPTs.

‍

Truthfulness in Latent Knowledge

de Stanislas Dozias & Sébastien Meyer

‍

Abstract:

‍

In their 2022 paper, Burns et al. introduce the Contrast-Consistent Search (CCS) model [1]. This model works in an unsupervised fashion by learning a consistent and confident mapping from the hidden states of a language model to a tuple of probabilities, as shown in Figure 1. This mapping can be used to perform binary classification for classes such as: True/False, positive/negative, duplicates/not duplicates, etc. It achieves better results than other unsupervised techniques such as zero-shot classification [5], however it still underperforms logistic regression in the supervised setting [1]. To go beyond this paper, we propose two ideas. The first consists in finding and analyzing the direction of truthfulness within the hidden states and making a projection on it. We introduce a method called Mean Direction Projection (MDP).

Recall from [1] that the questions qi are fed to the language model in two different options, one being x+

for a positive answer and the other being x- for a negative answer. Our method consists in taking the mean of the differences between the hidden states corresponding to the positive answers φ(x+) and negative answers φ(x-), multiplying each difference by yi being +1 if the true answer is positive and -1 if the true answer is negative. We applied our method to three different datasets and we compared it with the direction found by CCS. Finally, we applied both our model and CCS to the Moral Uncertainty Research Competition dataset.

‍

retour EN savoir plus Lire