An algorithm without a cause

La Défense was built in a time of great optimism, its ring of tall towers and fountain-filled squares offering a concrete and glass symbol of 1970’s hope. In the centre of main esplanade, tourists mill and take pictures while sharp-suited business people move with earnest determination, traversing between buildings like sprites in a vast computer simulation. At one end of the terrace, steps rise towards La Grande Arche – an imposing structure carefully positioned on a straight line connecting to the Arc de Triomphe and the Louvre, all waypoints on a majestic, historic timeline. “Look at us,” they all say. “Look at what we can do.”

I meet Augustin Huret in a nondescript meeting room, situated on the 17th floor of a tower overlooking the esplanade. It’s a fitting place, I think to myself as I sip a small, yet eye-wateringly strong coffee, for a discussion about the future. Augustin is a slim, quiet Frenchman, wearing a shirt and tweed jacket. At first glance, seated at the table with his Filofax, he looks like any other business consultant but the sharpness in his eyes quickly puts any such feelings to rest. “So, remind me. Why are we here?” he asks, head cocked to one side. I explain the background to the interview, which he considers just long enough to make me feel uncomfortable. Then he smiles, his hawk-like eyes softening. “Well, we had better get on with it,” he says.

Augustin Huret had no ordinary childhood. Whereas many of us used to spend long car journeys playing I-Spy or spotting 2CVs, the Huret family was different. “My father used to ask us about his algorithms,” says Augustin. “He wanted to be sure that they made sense to normal people.” At the time, the older M. Huret was tackling the challenge of using mathematics to look for unusual phenomena in large, incomplete sets of data — needles in the information haystack. Augustin was just twelve years old but he listened, and he learned.

The human brain has a remarkable capacity to spot such anomalies. If you were faced with a painting of a vast, troubled ocean, after a while you might eventually spot a small difference of colour. Look more closely and you would see a small ship being tossed on the waves. You’d probably start to wonder whether it was in trouble, or what it was doing out there in the first place – what was the pilot thinking? You might think about a movie with a similar plot, and wonder whether there was any relationship between the picture and the film. But even as your mind wandered, you would still simply accept what you were seeing, completely ignoring how difficult it was for your brain to process the huge quantities of data involved.

M. Huret Senior was working in this area of data analysis, known as pattern recognition. Much of what we think of as ‘intelligence’ actually relates to our ability to spot similarities – from how we use our eyes when we’re looking for our keys in a cluttered drawer, to that leaden feeling of déjà-vu we get when we find ourselves in a familiar argument. While scientists are still getting to the bottom of how the human brain makes such leaps of understanding, methods to replicate our simpler innate capabilities do exist. For example, a computer program can compare a given pattern (say, a fingerprint) with every known other one. If the pattern can’t be found, it can be ‘learned’ and added to a database, so it can be spotted next time around.

Augustin’s father was looking into more general methods of pattern recognition, which didn’t need to know anything about the data they were working with. Throw in a set of data points – the bigger the better – and algorithms would build a picture, from which they could then spot anomalies like the ship on the waves. While the principle was sound, the main hurdle was the sheer scale of the problem — simply put, the algorithms required more data than 1970’s computers could process. Time was not on Huret Sr’s side: not only would it take several years of processing to deliver any useful results, but (with a familiar echo) funding bodies were becoming increasingly impatient. Even as Augustin’s father was testing his theories, interest in the field in which he was working — Artificial Intelligence, or AI — was waning. In the end, the money ran out altogether. M. Huret Senior’s theoretical models remained just that – theoretical.

A decade passed. Having completed his degree with flying colours, as the young Augustin was embarking on a Masters degree, he decided to take his father’s challenge onto his own shoulders. The passage of time was on Augustin’s side: one year before, the year before he started his MSc, at a cost of over 10 million dollars, the illustrious Polytechnique university had taken delivery of a Cray II supercomputer. Scientific computers are measured in the number of floating point instructions they can process per second (dubiously known as FLOPs) and the Cray was capable of running almost two billion of these. That’s two billion adds, subtracts or basic multiplications – per second! At last, the younger Huret was able to test his father’s theories on computers that were powerful enough for the job. He could even improve the theories – and, for the first time ever, use them in anger on real data sets.

He was faced with the perfect problem – systems failures in the production of nuclear energy. The French government had become a significant investor in nuclear power, as a direct response to the oil crisis in 1973. In consequence, by the mid-Eighties over 50 nuclear power stations had been built across the country, and more were being constructed. With the events of the 1979 nuclear accident at Three Mile Island still front of mind, the occasional lapse of this enormous, yet fledgling infrastructure was met with high levels of concern. Each failure was accompanied by massive quantities of data on all kinds of things, from equipment measurements to core temperatures. Without algorithms, the data was useless however. At that point, nobody had been able to analyse the data sufficiently well to identify any root cause of the different failures.

The Huret algorithms worked exhaustively: that is, they looked at every single possible combination of data items and compared them to every other in the data set, to see if they were linked. To make things even more complicated the algorithms then randomise the data set and go through the same, exhaustive process, to be sure that the anomalies aren’t just there by chance. Given the size of the nuclear power plant data set, dealing with this processing task took inordinate amounts of processor time — 7.5 million seconds, or 15 million billion FLOPs! While this took the Cray a continuous period of 3 months to process, the results were worth it. The resulting rules highlighted what was causing the majority of systems failures across the network of nuclear power stations in France. Before Augustin knew it, he was presenting to heads of government and at international symposia. His father’s original principles, honed, modified and programmed, had more than proved their worth.

While it would be many more years before such algorithms as the Huret’s were viable for more general use, their ‘exhaustive’ approach was not the only way of making sense of such imposing data sets. What if, say, you started with a rough guess as to where the answer might lie, and seed the algorithm with this knowledge? This is far from a new idea. Indeed, it was first mooted over 250 years ago, by an English cleric, non-conformist minister Thomas Bayes. Bayes’ theorem1, which works on the basis of thinking of an initial value and then improving upon it, rather than trying to calculate an absolute value from scratch. It was largely ignored in his lifetime, and has been denigrated by scientists repeatedly ever since — largely because the amount of data available to statisticians made it seem unnecessary, or even unseemly, to add guesswork. “The twentieth century was predominantly frequentist,” remarks2 Bradley Efron, professor of Statistics at Stanford University. While Bayesians link a probability to the likelihood of a hypothesis being true, Frequentists link it to the number of times a phenomenon is seen in tests. Explains3 Maarten H. P. Ambaum at the University of Reading in the UK, “The frequentist says that there is a single truth and our measurement samples noisy instances of this truth. The more data we collect, the better we can pinpoint the truth.”

Frequentist models are better at working with large data sets if you want accuracy, it has been suggested4, but as data volumes start to get out of hand (and particularly given the coastline paradox, in this world of incomplete knowledge), Bayesian models become increasingly interesting. “Bayesian statistics is the statistics of the real world, not of its Platonic ideal,” continues Ambaum. Says5 security data scientist Russell Cameron Thomas: “Because of Big Data and the associated problems people are trying to solve now, pragmatics matter more than philosophical correctness.” Not least to companies such as Google and Autonomy, who built their respective businesses on the back of the reverend’s theorems. “Bayesian inference is an acceptance that the world is probabilistic,” explains Mike Lynch, founder of Autonomy. "We know this in our daily lives. If you drive a car round a bend, you don’t actually know if there is going to be a brick wall around the corner and you are going to die, but you take a probabilistic estimate that there isn’t.”

Google built its search-driven advertising empire entirely on Bayesian principles, a model which continues to serve the company well — roughly 89% of the firm’s $66 billion revenues in 2014 came from its advertising products. The blessing and curse of Google’s probabilistic approach was illustrated by the momentous success, then the crashing failure, of its Flu Trends program, which used6 the company’s search term database to track the spread of Influenza, connecting the fact people are looking for information about the flu, and their locations, with the reasonable assumption that an incidence of the illness has triggered the search. All was going well until the tool missed an outbrak by a substantial margin. Explained David Lazer and Ryan Kennedy in Wired magazine, “There were bound to be searches that were strongly correlated by pure chance, and these terms were unlikely to be driven by actual flu cases or predictive of future trends.” It’s a scenario reminiscent of Umberto Eco’s novel ‘Foucault's Pendulum’, in which a firm of self-professed vanity publishers stumble upon a slip of paper containing what looked, to all intents and purposes, like a coded message. The hapless figures are plunged into a nether world of Templars and Rosicrucians, Mafiosos and voodoo, causing a chain of events which end (look away now if you don’t want to know the plot) in their ultimate demise. The twist, it turns out, was that the scrap of paper was no more than a medieval laundry list.

Given that we have both exhaustive and probabilistic approaches at our behest, what matters ultimately is results. Top down is as important as bottom up, and just as science is now accepting7 the importance of both frequentist and Bayesian models, so can the rest of us. The creators of algorithms are exploring how to merge8 the ideas of Bayesian and Frequentist logic — for example in the form of the bootstrap, a computer-intensive inference machine.

The end result is that we will know a lot more about the world around us and, indeed, ourselves. For example, the Huret algorithms were used to exhaustively analyse the data sets of a ophthalmic (glasses) retailer — store locations, transaction histories, you name it. The outputs indicated a strong and direct correlation between the amount of shelf space allocated to children, and the quantity of spectacles sold to adults. The very human interpretation on this finding was, simply, that kids like to try on glasses, and the more time they spend doing so, the more likely are their parents to buy. Equally, did you know that in retail, brightly coloured cars make better second hand deals (they have more careful owners)? Or that vegetarians9 are less likely to miss their flights?

With the Law of Diminishing Thresholds holding fast, correlations such as these will become everyday, as will more deeply valuable topics, such as those in healthcare — such as monitoring10, and potentially postponing, the onset of Parkinson’s Disease. Some, seemingly unsolvable tasks, such as weather prediction, have not changed in scale since they were first documented — might we finally be able to predict the weather? And it’s not just unsolved problems that can be tackled — we are also starting to find insights in places nobody even thought to look, as demonstrated by our ability11 to compare every single painting in history to every other.

As we have seen, in a couple of years, we will be able to analyse such complex data sets on desktop computers or even mobile phones – or more likely, using low-cost processing from the cloud. Given the coastline paradox, some problem spaces will always be just outside the reach of either exhaustive or probabilistic approaches — we will never be able to analyse the universe molecule by molecule, however much we improve12 upon Heisenberg's ability to measure. But this might not matter, if computers reach the point where they can start to define what is important for themselves. Mike Lynch believes we may already be at such a fundamental point, with profound consequences. “We’re just crossing the threshold,” he says. “Algorithms have reached a point where they have enabled something they couldn’t do before — which is the ability of machines to understand meaning. We are on the precipice of an explosive change which is going to completely change all of our institutions, our values, our views of who we are.”

Will this necessarily be a bad thing? It is difficult to say, but we can quote Kranzberg’s first law13 of technology (and the best possible illustration of the weakness in Google’s “do no evil” mantra) — "Technology is neither good nor bad; nor is it neutral.” Above all, it would remove human bias from the equation. We are used to being able to take a number of information sources and derive our own interpretations from them — whether or not they are correct. We see this as much in the interpretation of unemployment and immigration figures as consumer decision making, industry analysis, pseudoscience and science itself — just ask scientist and author Ben Goldacre, a maestro at unpicking14 poorly planned inferences. Agrees15 David Hand, Emeritus Professor of Mathematics at Imperial College, London, “Obscure selection biases may conjure up illusory phantoms in the data. The uncomfortable truth is that most unusual (and hence interesting) structures in data arise because of flaws in the data itself.”

But what if such data was already, automatically and unequivocally interpreted on our behalf? What if immigration could be proven without doubt to be a very good, or a very bad thing? Would we be prepared to accept a computer output which told us, in no uncertain terms, that the best option out of all the choices was to go to war, or indeed not to? Such moral dilemmas are yet to come but meanwhile, the race is on: researchers and scientists, governments and corporations, media companies and lobby groups, fraudsters and terrorists are working out how to identify needles hidden in the information haystack. Consulting firm McKinsey estimates16 that Western European economies could save more than €100 billion making use of analytics to support government decision-making and elsewhere, big data is big business.

But right now, to succeed requires thinking outside the algorithm and using the most important asset we have: ourselves.















  15. Focus/September 2014