How Computers Learn

Peter Norvig on big data:

It turns out that a lot of problems, if you feed in a little bit of data to a very simple algorithm, it performs terribly, but if you get out into the millions and billions of examples, it starts to perform well. Having a tricky new idea isn’t as good as going out and gathering more data.

This is from Norvig’s recent talk, “How Computers Learn”:

Norvig is an excellent explainer, using many analogies, examples, and simple visuals. He starts with a real world scenario, comparing two different ways to write computer programs:

  • By programmers, with logic
  • By computers, with probabilities

Let’s go back to the real world, and think of a typical day for “Anna”:

  • She speaks into her phone, says some words, and the phone recognizes her speech, and does the right thing. …
  • Let’s say she goes into a supermarket, and the stock that’s on the shelf, some computer system has learned what everybody in that neighborhood wants to buy, and they stock the right stuff.
  • She pays with a credit card, and the credit card company has figured out, is this transaction a valid one, or is it fraudulent, should I accept it or deny it?
    She posts a picture to a social networking site, and the faces of her friends are all recognized and tagged.
  • And maybe she wants to plan a trip, and she asks for the most efficient route, and a website directs her exactly there.

So she does all these things, but every single one of them has this property: that the programmer wouldn’t necessarily know how to do it. I don’t know how to recognize speech, or recognize somebody’s face – I can’t write down the steps to do that. So I’m stuck.

So what do we do when the programmer can’t come up with a solution? The answer is, the computer could come up with a solution. We feed it some examples. And then the output of that program, rather than doing something, is it produces a new program. So it learns to write that program that we as programmers aren’t smart enough to do. …

How well does it do? The answer is, it depends on how well you train it.

 

norvig_data_threshold

Google makes its living from trying to find problems that have this kind of shape, and then trying to find the billions of examples that go with it, and then doing very well.

Norvig continues by exploring several concrete examples, with intuitive explanations instead of technical jargon:

  • Making sense of word senses
  • Translating between languages
  • Recognizing things in pictures
  • Writing captions for pictures
  • Learning to play video games by self-exploration

Physicist Richard Feynman talked about “the difference between knowing the name of something and knowing something.” Norvig truly, deeply knows machine learning – and still communicates with humility, accessibility, and a sense of wonder. I’m grateful for the opportunity to learn from him.

Further reading, for well-rounded, contrasting perspectives:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s