Saturday, June 06, 2020

Shortcut Learning in Deep Neural Networks: A Critique

Amended 12/29/2020


Amendment of 12/29/2020


This paper has gained much attention lately. As of 12/29/2020, it has already garnered between 48 and 63 citations since its first publication in April 2020. Meanwhile, it was also published in Nature Machine Intelligence.

At least one of the authors, Richard Zemel, is a well known machine learning researcher from the University of Toronto.

May I also hint at that most of the shortcut issues addressed in that paper can fairly easily be improved upon. Take e.g. a computer vision model trained on reasonable quality natural images taken by comparable cameras, then you would expect following capabilities:

  1. To discriminate between natural images and other images
  2. To discriminate between foreground and background of the image

Original Blog Post

Shortcut Learning in Deep Neural Networks: A Critique

Thomas Bingel

Happy Autodidact


Abstract

I get annoyed when I read papers that pretend to be research papers, but actually contain a thinly disguised popular political agenda! Further, this paper is a hodgepodge and more!


Critique


I have just finished reading one of the latest papers in this line of work that try to prove how unreliable deep neural networks are, i.e. Shortcut Learning in Deep Neural Networks by Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, Felix A. Wichmann (https://arxiv.org/abs/2004.07780). In this critique, I add a lot of emphasis by bold facing.


To use polemic this and similar works are often rather humbug and distractions! This paper, in particular, is a very fluffy and inflated paper with 29 pages and 143 references as if quantity mattered. Overall, in my opinion, these types of papers are very generally speaking, rather cheap shots!


This paper also happens to be rather pretentious by invoking none other than Isaac Newton in the first few sentences of their introduction. The reader is to stand in awe as to what is presented next. That is a cheap trick of rethorics!


Further, this paper presents a convoluted hodge podge of subjects. This paper is all over the place without focus! The authors obsessively try to subsume almost everything under the sky as shortcut learning! This paper also suffers from being occasionally highly politicized!


Of course, a DNN that was trained on let’s say exclusively on natural images does not do very well when presented with distorted or abstract images. What is the point? What do you expect? Even a 10 year old child can figure that out!


E.g. on page 8, the authors show following image:


Well would you have immediately recognized what kind of animal this is without ever having seen this image before? Be honest! (The authors were so kind or so manipulative to tell you right next to the image what it represents). The paper contains other silly examples like that.


What about the cow walking along water on a sandy beach?


Even many humans would say how unusual especially those who have perhaps never lived or spent time at such a beach.

 




I guess we can all agree that DNNs so far lack proper means to classify an image as out of distribution instead of issuing a classification result. This could, in my estimation, e.g. easily be remedied by adding another model that does exactly that!


I suspect most computer vision researchers are keenly aware of the cited shortcomings!


Then the authors report on page 10 “Agent-based (Reinforcement) Learning Instead of learning how to play Tetris, an algorithm simply learned to pause the game to evade losing [67].” Well, when you look up the reference it turns out to be a paper from 2013 published by the The Association for Computational Heresy (I kid you not!! Their logo is very reminiscent of the renown Association for Computing Machinery (ACM)). Well, my jaw dropped! Seriously! Anyway, 2013 indicates this is a very early paper that probably does not reflect much the current state of the art!

Shortly after the authors say “However, they [reward function] all too often contain unexpected shortcuts that allow for so-called reward hacking [68].” Well, all software is unsafe and prone to hacking, we know that for more than 60 years or so!


Under the chapter “Fairness & algorithmic decision-making” we learn “Tasked to predict strong candidates on the basis of their resumes, a hiring tool developed by Amazon was found to be biased

towards preferring men. … [13]”. To substantiate this claim, the authors refer to a Reuters news article! Oh, wow! Perhaps, the authors are not aware that it is still very common that some jobs are still carried out by significantly more males than females and vice versa by their own choice.


The authors also have a tendency to overemphasize the importance of one research result entirely unrelated to AI & machine learning and try to reapply it elsewhere in the paper. On page 2, the authors introduce the reader to “Rats learned to navigate a complex maze apparently based on subtle colour differences … Intensive investigation into this curious finding revealed that the rats had tricked the researchers: ... [rats] instead simply discriminated the colours by the odour of the colour paint used on the walls of the maze.” On page 12, the authors state “... but if DNNs successfully recognise objects, it seems natural to assume that they are using object shape like humans do”. There is probably nothing natural to assume that when we compare machine learning to human learning!


On page 16, we learn “Individual fairness aims at treating similar individuals similarly while group fairness aims at treating subgroups no different than the rest of the population”. Group fairness? What? This is where again obviously politics creeps in dangerously!


Why are shortcuts learnt? The authors address this on page 14-15. We read “The “Principle of Least Effort” Why are machines so prone to learning shortcuts, detecting grass instead of cows [9] or a metal token instead of pneumonia [15]? Exploiting those shortcuts seems much easier for DNNs than learning the intended solution. But what determines whether a solution is easy to learn? In Linguistics, a related phenomenon is called the “Principle of Least Effort” [119], the observation that language speakers generally try to minimise the amount of effort involved in communication. For example, the use of “plane” is becoming more common than “airplane”, and in pronouncing “cupboard”, “p” and “b” are merged into a single sound [120, 121].” This sounds plausible, but it is not! This is anthropomorphism applied in a curious way to machine learning. The authors actually use the term “anthropomorphism” in the paper, but in a rather convoluted way. Principle of Least Effort may also not apply here the way the authors intended! I have serious doubts that this interpretation is helpful!


In their conclusion, the authors chose following quote to open this chapter: ““The road reaches every place, the short cut only one” — James Richardson [143, Richardson, J. Vectors: aphorisms & ten-second essays (Ausable Press, 2001)]” Sounds great for one second only, before any intelligent reader starts to think about it. A roadway shortcut has many potentials, it may well lead you to new unexplored places or to meet perhaps people that you didn't know existed etc. For a human to take a roadway shortcut is bold and may include discoveries, exploration of the unknown and much more.


Their conclusion includes at least one recommendation, I agree with: “Consequently, o.o.d. [out of order distribution] generalization tests will need to become the rule rather than the exception.”


This paper is also quite peculiar as the authors opted to amend their references with their own short summaries below the reference entry. The authors added these explicitly separated, indented, highly visible, and bold faced summaries to 11 references out of a total of 143. I have read now hundreds of research papers related to AI & machine learning covering many areas of research. This is the first paper, I remember, resorting to such author amendments in the references section.


I do not want to bother the attention of the reader any longer, you probably get the idea! 


Conclusion


I almost get the impression that these mostly German authors (except for e.g. Richard Zemel) had some kind of axe to grind! I do not want to speculate here what kind of axe it might be.


This is not a research paper, but a paper with a political agenda and philosophical approach!

It seems to be designed to attract a lot of citations!

No comments: