tl;dr: building features manually is inefficient and extracting them automatically is possible and, in a sense, better.
I’m not writing a lot here these days, so let me summarize: I left my job recently to start working on my Masters at Universidade of São Paulo (USP). I’m studying an area of Machine Learning called Deep Learning (or Representation Learning, the term I prefer) for my research. This post is an overview of what I’ve read so far about the subject.
(By the way, it is practically impossible to enter a PhD program in Brazil without a MSc.)
A typical workflow
When building a Machine Learning system, you generally start by aggregating a dataset and selecting the features for training. Then you run it through whatever algorithm you’re familiar with—say, SVM or logistic regression—and evaluate the resulting machine on some separate validation data. And the resulting performance is bad. (reference: my own experience™)
What to do when your system’s performance is awful? There are a few available choices. Are you training it with too small a dataset? Get more data. Is the algorithm overfitting the data anyway? Use some sort of regularization during training, or look for simpler (i.e. smaller parameter space) algorithms.
You have a lot of data and the damn thing still can’t tell a dog from a cat? This situation is hard. You might have come upon a problem in which the decision surface is very non-linear and your algorithm is unable to model it sufficiently well. Or the training procedure is stuck in a local minimum, or straying from a good minimum by moving too much, e.g. because of a high learning rate. Or you didn’t select the correct features. (or all of the above—good luck!)
In computer vision, you can come up with elaborate features based on pixel intensities: edges, lines, regions. The same happens when your domain involves videos or sound. But coming up with new features when your problem is credit scoring is an art. And so it is with churn prediction, sentiment analysis, content discovery, and others. Of course, we can come up with good features for these problems, some of which are considered “solved” (i.e. usable in practice).
The point is—finding good features is time-intensive, cumbersome, and plain boring. There isn’t an algorithm for it, there isn’t good, reliable rules for every domain, and it is impossible to know a priori if you have enough features, except when paired with a lot of expert knowledge about the subject.
Machines are better than us at this (again)
What if a machine could learn features by itself? And then learn better features based on the previous ones? And so on? That would be neat, and is the subject of Deep Learning.
When you’re dealing with a neural network, you have the concept of a hidden layer. Now suppose you have a lot of them, each built up from the previous layer’s results. The first hidden layer processes data directly from the data; the second layer learn how to respond to changes in the first one; and that process goes on until the final layer, for example classifying the input into a group. Each of these layers is “learning” features based on the previous one’s results.
A Google TechTalk by Andrew Ng shows a lot of the current (as of 2013) research on the subject, with a better explanation of the rationale behind it and a sample of the benchmarks already dominated by Deep Learning:
(I find his talks really interesting, he’s a good lecturer)
So, where to go next?
I can safely say that I know almost nothing about Deep Learning. I’ve separated some references from  to read, a series of video lectures  from G. Hinton, Yoshua Bengio, Andrew Ng, Yann LeCun, and others, and I’m playing with Caffe . There’s this tutorial by Andrew Ng  on Unsupervised Feature Learning and Deep Learning.
I also populated a directory on Mendeley with lots of papers on specific subjects (DBNs, stacked autoencoders, and several other things). I expect spending a long time reading it.
Well, that’s it. I’ll post again when I have some more stuff organized in my head.
- 1. Representation Learning: A Review and New Perspectives. (arXiv) Yoshua Bengio, Aaron Courville, and Pascal Vincent.
- 2. Deep Learning — An MIT Press book in preparation (link). Yoshua Bengio, Ian Goodfellow, and Aaron Courville.
- 3. Graduate Summer School: Deep Learning, Feature Learning. (link)
- 4. Caffe — Deep learning framework by the BVLC. (link)
- 5. A tutorial on Unsupervised Feature Learning and Deep Learning. (link)