Recommendable! This is a long interview with the Turing Award winner!
"... Those are contrastive methods. And at least in some contexts, I invented them for a particular type of self-supervised learning called "siamese nets." I used to be a fan of them, but not anymore. I changed my mind on this. I think those methods are doomed. I don't think they're useless, but I think they are not sufficient because they don't scale very well with the dimension of those things. ...
So, basically, you would need an exponentially large number of contrastive samples of energy to push up to get those contrastive methods to work. They're still quite popular, but they are really limited in my opinion. So what I prefer is the non-contrastive method or so-called regularized methods. ...
So, basically, you would need an exponentially large number of contrastive samples of energy to push up to get those contrastive methods to work. They're still quite popular, but they are really limited in my opinion. So what I prefer is the non-contrastive method or so-called regularized methods. ...
the "regularized latent variable energy-based model," the RLVEB. ...
Well, let me put it this way: I've not been as excited about something in machine learning since convolutional nets ... really something I'm super-excited about. ...
Now, I also changed my mind about this in the last few years. Now, my favourite model is not a generative model that predicts Y from X. It's what I call the joint embedding model that takes X, runs it through an encoder, if you like, a neural net; takes Y, and also runs it through an encoder, a different one; and then prediction takes place in this abstract representation space. ...
Well, let me put it this way: I've not been as excited about something in machine learning since convolutional nets ... really something I'm super-excited about. ...
Now, I also changed my mind about this in the last few years. Now, my favourite model is not a generative model that predicts Y from X. It's what I call the joint embedding model that takes X, runs it through an encoder, if you like, a neural net; takes Y, and also runs it through an encoder, a different one; and then prediction takes place in this abstract representation space. ...
The reason why we need to abandon probabilistic models is because of the way you can model the dependency between two variables, X and Y; if Y is high-dimensional, how are you going to represent the distribution over Y? We don't know how to do it, really. We can only write down a very simple distribution, a Gaussian or mixture of Gaussians, and things like that. If you want to have complex probability measures, we don't know how to do it, or the only way we know how to do it is through an energy function. So we write an energy function where low energy corresponds to high probability, and high energy corresponds to low probability, which is the way physicists understand energy, right? The problem is that we never, we rarely know how to normalize. There are a lot of papers in statistics, in machine learning, in computational physics, etc., that are all about how you get around the problem that this term is intractable.
What I'm basically advocating is, to forget about probabilistic modeling, just work with the energy function itself. It's not even necessary to make the energy take such a form that it can be normalized. What it comes down to in the end is, you should have some loss function that you minimize when you're training your data model that makes the energy function of things that are compatible low and the energy function of things that are incompatible high. ..."
No comments:
Post a Comment