Bayes and Business Intelligence

Bayes Law, Take 1

You’ve had what feels like a low-grade flu for almost a month and can’t seem to shake it. You’ve tried any number of off-the-shelf remedies with no relief. Finally, you go to your family doctor who, after examination, sends you to a hospital lab for a test to rule out a serious disease. You get the unfortunate positive results on Friday but your doctor is not again accessible until Monday. You wonder whether you’ll be able to survive the three day weekend.

You spend all day Friday and Saturday researching the possibilities on WebMD. You discover that the incidence of the disease indicated by the test is 1/1000, or 0.1%. You also get information that 99% of those with the disease return a positive test, and that 5% of those who don’t have the disease also test positive. All those percentages sound ominous and you start mentally preparing for the worst.

That is, until your statistics friend comes over to watch a Sunday football game. She looks at the figures you’ve recorded, pulls out her trusty iPhone, and starts crunching numbers. Seconds later, she tells you there’s just a 2% chance you actually have the disease, even with the positive test result. Incredulous but relieved, you give her a big hug and kick back to enjoy the game, all the while extolling the statistical gods.

Bayes Law, Take 2

I was recently browsing in the Finance and Investment aisle of a local Border’s when I came across a book guaranteeing that readers would become millionaires if they carefully followed the author’s investment advice. I first checked the book’s copyright page to assure a publication date before the Fall 2008 meltdown. I then wondered why if someone had such a bullet-proof, get-rich formula, he’d want to share it. Wouldn’t he just repeatedly execute the strategy himself? I finally set out to “test” the sensibility of the author’s guarantee – to determine the chances of my becoming a millionaire, given that I faithfully follow the author’s prescriptions.

Of course, I had no direct data on the success of the book, but I could give personal, subjective estimates to a number of probabilities that might help me make an educated guess. I first did a quick Internet search to discover there are no more than 10 million millionaires in the U.S. population of 300 million, about 3.33%. I then “guestimated” the chances of a person following the author’s advice to be quite small, say 2%, and the odds of following the author’s advice given that she’s a millionaire to be a bit larger, 5%. With those subjective inputs, I calculate the probability that I’ll get rich given that I follow the author’s advice to be 8.25% – hardly a lock. I think I’d insist on a money-back guarantee!

So Just What is Bayes Law?

Bayes theorem was first published in 1763, among the papers of Thomas Bayes, a then recently deceased British Presbyterian minister. Bayes, a Fellow of the Royal Society of London, was undistinguished as a mathematician in his lifetime. Yet his “An Essay Towards Solving a Problem in the Doctrine of Chances” came to be held in the highest regard in the field of probability theory.

Bayes Law provides a means to measure confidence in an event conditional on the occurrence of evidence or information. If E is the event of interest, then P(E) is the probability of E, with P(E) >= 0 and <= 1. Similarly, if D represents the conditional data or information, then P(D) is the probability of that information, and is also between 0 and 1 inclusive. P(~E) and P(D|~E) are the probabilities of no event and information given no event, respectively.  For applications of Bayes Law, the interest is in P(E|D) – the probability of event E given or conditioned on information D. In the disease example, P(E|D) represents the probability of disease given a positive test result. For the investing illustration, P(E|D) is probability of getting rich given strict adherence to the author’s investing protocol.

Bayes Law states that:

P(E|D) = P(E) * P(D|E)/P(D), or, substituting for P(D),

= P(E)*P(D|E)/(P(D|E)*P(E) + P(D|~E)*P(~E))

In Bayes jargon, P(E|D) is known as the Posterior Probability (Probability of the Event Given the Information), P(E) is called a Prior Probability (Previously Known Probability of the Information), and P(D|E)/(P(D|E)*P(E) + P(D|~E)*P(~E)) is the Likelihood Ratio (Learning Factor).

So, the Posterior Probability = Prior Probability*Likelihood Ratio, or, in cumbersome words:

The relative belief in a hypothesis given some data is the support the data provides for the hypothesis times the prior belief in the hypothesis divided by the support times the prior for all hypotheses.

For the disease example:

P(E|D) = .99*.001/(.99*.001 + .05*.999) = .019

The chance of disease given the positive test is thus about 2%.
For the investment example:

P(E|D) = .033*.05/.02 = .0825 or 8.25%

Bayesians vs. Frequentists

The paradigm of framing statistical problems in terms of posterior probabilities, prior probabilities, and likelihood ratios is known as the Bayesian method. Bayesian methodologies were dominant in the 19th century statistical world, but the 20th century saw the ascendance of the classical or frequentist paradigm. The frequentist flavor is what’s most often taught in traditional statistics curricula today, but there’s a healthy – and growing – appetite for Bayesian methods as well.

The Bayesian and frequentist approaches differ fundamentally in their characterizations of probability. Frequentists see probability as the objectively measured, relative frequency of an outcome over a large number of trials. Bayesians, in contrast, view probability as a more subjective concept tied to an individual’s judgment of the likelihood of an outcome. For frequentists, the uncertainty surrounding probability is in the events; for Bayesians, the uncertainty has to do with interpretation by observers.

The difference in the definition of probability created a significant divide between frequentists and Bayesians. Frequentists reject the reliance on subjective prior probabilities that drive Bayesian analyses, arguing that such probabilities cannot be measured reliably. Frequentists would cite my subjective priors in the investment book example as a case in point, noting the results of my analysis would change markedly with different “guesstimates.” Historically, they’ve also cited the complexity of calculations as an inhibitor to Bayesian adoption.

Bayesians, on the other hand, argue that their approach is more pertinent and natural for real world problems. They deploy more data that can be used for more complex problems and subsequently lead to stronger conclusions than frequentists. Bayesians cite the power of inversion – solving for P(E|D) by estimating priors and the more tractable likelihood P(D|E). They also tout their kinship with decision-making, arguing their methods transparently incorporate judgments for optimal decisions. What frequentists cite as a weakness of the Bayesian method is promoted as a strength by Bayesians.

Bradley Efron’s Statistical Wisdom

A few weeks ago, I had the privilege of attending a lecture by Bradley Efron, a distinguished Professor of Statistics at Stanford University, and 2005 National Medal of Science laureate. Efron gave a fascinating talk entitled “Learning from the Experience of Others,” that touched on identical twins, major league baseball batting averages, kidney function scores, new butterfly species, word counts in Shakespeare’s works, prostate cancer, and estimating false discovery rates in large scale statistical studies. The foundation for his learning from experience theme was none other than Bayes Law. An early career doubter, Efron is now a firm believer in a hybrid empirical Bayes approach, noting that the holy grail of statisticians is to use the experience of others without the baggage of subjective prior distributions: “Enjoy the Bayesian omelet without breaking the Bayesian eggs.”

Efron’s 2004 presidential address to the American Statistical Association, “Bayesians, Frequentists, and Scientists,” offers much wisdom for business intelligence.1 He argues that the debate between Bayesians and frequentists ultimately reflects different attitudes about data analysis. Frequentists are conservative and cautious statistical purists, aiming for universal conclusions, while Bayesians look to exploit all available evidence to make the quickest progress. Efron illustrates the contention by noting the Bayesian interests of drug manufacturer Pfizer to use any and all available opinions and information to determine how well its new drugs will work, while the frequentist FDA looks only for objective, experimental proof of drug efficacy. In a sense, Frequentists behave like exacting scientists; Bayesians more like decision-making businessmen. In contrast to the inside view of frequentists that focuses on statistical orthodoxy, the outside Bayesian goals obsess on answers to important strategic questions – wherever they may be found.

Efron observes that the demands of modern data analysis to address more complicated problems while accommodating ever-larger data sources, and the continuing emergence of computer-intensive analytical methods, are helping to create the empirical Bayes compromise between frequentists and Bayesians. Analysts can now use the abundance of data to both learn and test. The subjectivity that has historically been a source of irritation between the camps can now be replaced by more objective, “uninformative” priors – determined from preliminary looks at subsets of large data.

A Business Intelligence Guy Opines on Bayes

From a business intelligence perspective, a major advantage to the Bayesian approach is the learning that accumulates over time. Indeed from a BI perspective, the Bayesian model is best viewed as a systematic learning approach for intelligence and decision-making for business. The posterior probabilities for analysis one become the prior probabilities for analysis two; the posteriors for analysis two become the priors for analysis three, etc. Thus the Bayesian orientation promotes learning and better decision-making, all the while refining priors to be more reliable and less “subjective” over time.

The case for the compromise of empirical Bayes is further strengthened by the challenges of large data.  Traditional frequentist methods are increasingly ill-suited for large data volumes, while subsets powered by bootstrapping techniques provide opportunities to more “objectively” refine the contentious priors of Bayesian methods. Finally, the continued growth of hardware capacity and advances in computer-intensive statistics conspire to make the once intractable Bayesian calculations more accessible in current business environments.

I recently wrote an article on the contrasting styles of predictive modeling in statistics and machine learning in which it is argued that the ideological differences between camps can ultimately be positive for business intelligence, providing a wealth of disparate methods and techniques for addressing business analytics. In much the same way, the emergence of empirical Bayes is a boon for business intelligence as it helps to unite the frequentist and Bayesian camps – and progress both. The combination of rigorous inference from the frequentists and the sequential learning paradigm of Bayesians is consistent and encouraging of the practice of business intelligence. And collaboration between the two will be essential to solve the complex, large data problems that increasingly challenge the business intelligence world.

End Note:

  1. Bradley Efron. “Bayesians, Frequentists, and Scientists.” Journal of the American Statistical Association. March 2005, Vol. 100, No. 469.

References:

Thomas Bayes. “An Essay Towards Solving a Problem in the Doctrine of Chances.” Royal Society of London.1763. http://www.stat.ucla.edu/history/essay.pdf

Bayesian Inference. Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Bayesian_analysis

Michael Kaplan and Ellen Kaplan. Chances Are…Adventures in Probability. Penguin Books. 2006.

Douglas W. Hubbard. How to Measure Anything: Finding the Value of Intangibles in Business. John Wiley & Sons, Inc. 2007.

Jeff Grynaviski. “Applied Bayesian Statistics.” Political Science 435. University of Chicago. Spring, 2003. http://home.uchicago.edu/~grynav/bayes/abs03.htm

Anthony O’Hagan and Bryan R. Luce. “A Primer on Bayesian Statistics in Health Economics and Outcomes Research.” Centre for Bayesian Statistics in Health Economics. 2003. http://www.shef.ac.uk/content/1/c6/02/55/92/primer.pdf

Comments are closed.