Data Means The End Of Theory??? Puh-lease...

Once in a blue moon I pick up a Wired magazine... then I usually am reminded why I so rarely read it...

This month, they came out with a terrible article about The End Of Theory, all about how the deluge of digital information will make the scientific method obsolete.


It started out OK, with info about how Google was doing well not by making theories about trends, but instead by collecting massive amounts of data on behavior. True enough, and no complaints there... but Wired then extends this in bizarre directions, saying that this means an end to all scientific analysis: there are no more grand theories, its all just statistics now.

Further proof in the article? Quantum physics stopped trying to find out "why," and instead just focused on gathering tons of info on the "what." He also uses the "shotgunning" approach to DNA sequencing as the prime example of the end of theory. The whole thing was tons of useless "data" that didn't even come close to supporting his "theory" that data trumps theory.

How ironic... but what else would you expect from somebody with only a passing knowledge of science?

Firstly, every single example in the entire article is a false analogy. Either massive amounts of data were supporting existing scientific theory, or they were giving guidance where theories needed massive amounts of recent data. Is there a theory for what trends will be popular with 13-year olds? Sure, there are tons... but they are all based on the ability to quickly acquire recent data. The article claims that knowing the raw numbers is all you need... its a decent first approximation, but anybody with a passing knowledge of marketing knows that spotting trends are about two things: how many, and who? Google knows how many, but if you can determine if the "who" includes trendsetters, then the trend can turn into an epidemic.

The hard sciences -- like physics and biology -- also have well-established models that serve us well, which are pretty accurate even if based on old data. These models are great estimates in the absence of new data. That's the whole frigging point! Sure, you can tell which plane will crash by building 1,000,000 virtual models, and test flying them all... you'll sure get tons of data! But its a lot more cost effective to analyze data, make models, and test just 1 model at a time.

You should never be tempted to put data ahead of theory... do so, and I guarantee you will be destroyed by those who understand both. For example, there was a 10-year old article in the Atlantic Monthly warning about how the digital age will create an over-reliance on data instead of theory... one researcher demonstrated something like how over the past 50 years, the ups and downs on the S&P 500 nearly exactly mirrored milk production in Burma.

According to Wired, just watch milk production in Burma, and you'll be a billionaire! Of course, that advice is total crap... because next year cotton output in Egypt might be a better example. Or perhaps the length of Warren Buffet's fingernails is even better. If you just rely on data, your "model" changes too quickly to be useful... unless its based on a theory that depends on up-to-date data as an input, and can give guidance when you only have old or contradictory data.

Google makes the process faster, but ultimately changed nothing about the process itself. The discovery of useful knowledge still follows the scientific method:

  1. gather initial data
  2. make an initial hypothesis
  3. test the hypothesis with new data
  4. if the hypothesis is validated, it graduates to become a theory
  5. use the theory in lieu of up-to-data data, but
  6. continuously refine your theories with newer data, data in a different context, and data acquired with more accurate techniques

Seems to be what everybody is still doing... and apparently the editors of Wired were asleep during Science 101.



I totally agree with you..
Data can tell you what marketing strategies work well, but in the absence of a model, you cannot improve upon those strategies in any reasonable amount of time.. At best, you would just be throwing darts blindly, hoping something sticks..
..and who says Google doesn't have "theories" about what works? Thats ridiculous! I bet they have plenty of theories and the data they collect is used to differentiate what works from what doesn't..


at the very least, they have a theory that says something like this:

In some cases, the model changes too quickly to be useful for long, and needs constantly up-to-date data, and it should be analyzed in a certain way to create "the model for now."

it won't work for everything, and won't beat somebody who finds a genuine pattern to the data, but it's close enough to make a billion dollars ;-)


..and even that's "at the very least", as you said.. and on the other extreme, the author claims an end to all theory! [rolling eyes]

Consider this claim made by the author..
Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough.

The author uses this as basis to support his claim of statistics over theory.. That's a misleading analogy.. Isn't Pagerank also a "theory" that the statistics of incoming links are significant? ..or is the author trying to suggest that Pagerank came about as a result of some statistics? Pagerank was born out of an idea, a "theory"! The statistics are a "means" to implement the theory.. Pagerank came first, the statistics came later.. The author is confusing correlation with causality, and insulting his readers while he's at it..

Recent comments