Abstract
The complexity of recent ML models and their number of weights makes it very difficult to understand the features in the training data that lead to a given model output. This problem results in what is called underspecification: when several very different predictors have similar performances on the training data. As explained in [5], underspecification can have harmful consequences when deploying a model on real data, leading to unexpectedly poor performances. This challenge addresses this issue by artificially creating a change in distribution between training data and testing data. Thus, the goal is to predict the age (Young/Old) of individuals from photos, but the training data favors learning a predictor based on reading a Young/Old text and not on face analysis. However, this text-based predictor cannot work on the test set, because in the latter text is not correlated with age. To answer this question, we have tried two approaches. First, we wanted to implement the DivDis network described by [12]. This network uses several classification heads, which correspond to predictors using different features of the training data. To improve the performance of this model, we have pretrained on purpose a classification head to do precisely what should be avoided, i.e. classify images using only their text. However, this approach was not conclusive and did not lead to good performances. Second, we have exploited the fact that underspecification here is quite simple and can be bypassed. Thus, we used Error Level Analysis (ELA), a simple method based on JPEG compression, to isolate text areas in the images, in order to remove them, thus solving the underspecification problem. Although this last method is less in line with the philosophy of the challenge, it allowed us to obtain better accuracy than with the DivDis architecture (73% vs. 64%).