pages

Saturday, March 09, 2013

Data is insufficient without a mind

Weather prediction has changed a lot over the centuries and while most predictions over the course of the last dozen decades were made with existing data, just calculating data does not draw a statistical calculation in perfect alignment with reality.
the National Centers for Environmental Prediction looked like a cross between a submarine command center and a Goldman Sachs trading floor. Twenty minutes outside Washington, it consisted mainly of sleek workstations manned by meteorologists working an armada of flat-screen monitors with maps of every conceivable type of weather data for every corner of the country. The center is part of the National Weather Service, which Ulysses S. Grant created under the War Department. Even now, it remains true to those roots. Many of its meteorologists have a background in the armed services, and virtually all speak with the precision of former officers.
They also seem to possess a high-frequency-trader’s skill for managing risk. Expert meteorologists are forced to arbitrage a torrent of information to make their predictions as accurate as possible. After receiving weather forecasts generated by supercomputers, they interpret and parse them by, among other things, comparing them with various conflicting models or what their colleagues are seeing in the field or what they already know about certain weather patterns — or, often, all of the above. From station to station, I watched as meteorologists sifted through numbers and called other forecasters to compare notes, while trading instant messages about matters like whether the chance of rain in Tucson should be 10 or 20 percent. As the information continued to flow in, I watched them draw on their maps with light pens, painstakingly adjusting the contours of temperature gradients produced by the computers — 15 miles westward over the Mississippi Delta or 30 miles northward into Lake Erie — in order to bring them one step closer to accuracy.
These meteorologists are dealing with a small fraction of the 2.5 quintillion bytes of information that, I.B.M. estimates, we generate each day. That’s the equivalent of the entire printed collection of the Library of Congress about three times per second. Google now accesses more than 20 billion Web pages a day; the processing speed of an iPad rivals that of last generation’s most powerful supercomputers. All that information ought to help us plan our lives and profitably predict the world’s course. In 2008, Chris Anderson, the editor of Wired magazine, wrote optimistically of the era of Big Data. So voluminous were our databases and so powerful were our computers, he claimed, that there was no longer much need for theory, or even the scientific method. At the time, it was hard to disagree.
But if prediction is the truest way to put our information to the test, we have not scored well. In November 2007, economists in the Survey of Professional Forecasters — examining some 45,000 economic-data series — foresaw less than a 1-in-500 chance of an economic meltdown as severe as the one that would begin one month later. Attempts to predict earthquakes have continued to envisage disasters that never happened and failed to prepare us for those, like the 2011 disaster in Japan, that did.
The one area in which our predictions are making extraordinary progress, however, is perhaps the most unlikely field. Jim Hoke, a director with 32 years experience at the National Weather Service, has heard all the jokes about weather forecasting, like Larry David’s jab on “Curb Your Enthusiasm” that weathermen merely forecast rain to keep everyone else off the golf course. And to be sure, these slick-haired and/or short-skirted local weather forecasters are sometimes wrong. A study of TV meteorologists in Kansas City found that when they said there was a 100 percent chance of rain, it failed to rain at all one-third of the time.
But watching the local news is not the best way to assess the growing accuracy of forecasting (more on this later). It’s better to take the long view. In 1972, the service’s high-temperature forecast missed by an average of six degrees when made three days in advance. Now it’s down to three degrees. More stunning, in 1940, the chance of an American being killed by lightning was about 1 in 400,000. Today it’s 1 in 11 million. This is partly because of changes in living patterns (more of our work is done indoors), but it’s also because better weather forecasts have helped us prepare.
Perhaps the most impressive gains have been in hurricane forecasting. Just 25 years ago, when the National Hurricane Center tried to predict where a hurricane would hit three days in advance of landfall, it missed by an average of 350 miles. If Hurricane Isaac, which made its unpredictable path through the Gulf of Mexico last month, had occurred in the late 1980s, the center might have projected landfall anywhere from Houston to Tallahassee, canceling untold thousands of business deals, flights and picnics in between — and damaging its reputation when the hurricane zeroed in hundreds of miles away. Now the average miss is only about 100 miles.
Why are weather forecasters succeeding when other predictors fail? It’s because long ago they came to accept the imperfections in their knowledge. That helped them understand that even the most sophisticated computers, combing through seemingly limitless data, are painfully ill equipped to predict something as dynamic as weather all by themselves. So as fields like economics began relying more on Big Data, meteorologists recognized that data on its own isn’t enough... For centuries, meteorologists relied on statistical tables based on historical averages — it rains about 45 percent of the time in London in March, for instance — to predict the weather. But these statistics are useless on a day-to-day level. Jan. 12, 1888, was a relatively warm day on the Great Plains until the temperature dropped almost 30 degrees in a matter of hours and a blinding snowstorm hit. More than a hundred children died of hypothermia on their way home from school that day. Knowing the average temperature for a January day in Topeka wouldn’t have helped much in a case like that... What Richardson needed, he thought, was more manpower. He envisioned a weather-forecasting center with some 64,000 meteorologists, all working simultaneously to have the computational speed to make accurate weather forecasts in real time. His dream came to fruition (sort of) in 1950, when the first computer weather forecast was tried by the mathematician John von Neumann and a team of scientists at the Institute for Advanced Study in Princeton, N.J. They used a machine that could make about 5,000 calculations a second, which was quite possibly as fast as 64,000 men. Alas, 5,000 calculations a second was no match for the weather. As it turned out, their forecast wasn’t much better than a random guess.
Our views about predictability are inherently flawed. Take something that is often seen as the epitome of randomness, like a coin toss. While it may at first appear that there’s no way to tell whether a coin is going to come up heads or tails, a group of mathematicians at Stanford is able to predict the outcome virtually 100 percent of the time, provided that they use a special machine to flip it. The machine does not cheat — it flips the coin the exact same way (the same height, with the same strength and torque) over and over again — and the coin is fair. Under those conditions, there is no randomness at all.
The reason that we view coin flips as unpredictable is because when we toss them, we’re never able to reproduce the exact same motion. A similar phenomenon applies to the weather. In the late 1950s, the renowned M.I.T. mathematician Edward Lorenz was toiling away in his original profession as a meteorologist. Then, in the tradition of Alexander Fleming and penicillin or the New York Knicks and Jeremy Lin, he made a major discovery purely by accident. At the time, Lorenz and his team were trying to advance the use of computer models in weather prediction. They were getting somewhere, or so they thought, until the computer started spitting out contradictory results. Lorenz and his colleagues began with what they believed were exactly the same data and ran what they thought was exactly the same code; still, the program somehow forecast clear skies over Colorado in one run and a thunderstorm in the next.
After spending weeks double-checking their hardware and trying to debug their code, Lorenz and his team discovered that their data weren’t exactly the same. The numbers had been rounded off in the third decimal place. Instead of having the barometric pressure in one corner of their grid read 29.5168, for example, it might instead read 29.517. This couldn’t make that much of a difference, could it? Actually, Lorenz realized, it could, and he devoted the rest of his career to studying strange behaviors like these by developing a branch of mathematics called chaos theory, the most basic tenet of which is described in the title of his breakthrough 1972 paper, “Predictability: Does the Flap of a Butterfly’s Wings in Brazil Set Off a Tornado in Texas?” In other words, a small change in initial conditions can produce a large and unexpected divergence in outcomes.
Chaos theory does not imply that the behavior of the system is literally random. It just means that certain types of systems are very hard to predict. If you know the exact conditions of a coin as it leaves someone’s hand, you can — with the right laboratory equipment — predict, almost perfectly, which side it will land on. And yet the slightest disturbance to that motion can change a coin toss from being almost wholly predictable to almost wholly unpredictable.
The problem with weather is that our knowledge of its initial conditions is highly imperfect, both in theory and practice. A meteorologist at the National Oceanic and Atmospheric Administration told me that it wasn’t unheard-of for a careless forecaster to send in a 50-degree reading as 500 degrees. The more fundamental issue, though, is that we can observe our surroundings with only a certain degree of precision. No thermometer is perfect, and it isn’t physically possible to stick one into every molecule in the atmosphere.
Weather also has two additional properties that make forecasting even more difficult. First, weather is nonlinear, meaning that it abides by exponential rather than by arithmetic relationships. Second, it’s dynamic — its behavior at one point in time influences its behavior in the future. Imagine that we’re supposed to be taking the sum of 5 and 5, but we keyed in the second number as 6 by mistake. That will give us an answer of 11 instead of 10. We’ll be wrong, but not by much; addition, as a linear operation, is pretty forgiving. Exponential operations, however, extract a lot more punishment when there are inaccuracies in our data. If instead of taking 55 — which should be 3,125 — we instead take 56, we wind up with an answer of 15,625. This problem quickly compounds when the process is dynamic, because outputs at one stage of the process become our inputs in the next.
Uncertainty is part of the business and recently the National Weather Service has made a point about being upfront with their uncertainty.

No comments: