Bayes Theorem has become rather a mainstream tool and its efficacy seems to have been generally accepted, however the epistemological basis may require careful reinterpretation. This article is concerned with epistemological issues rather than technical depth in mathematical and probability concepts.
Further questions to consider in this paper are
1. What sorts of problems have Bayesian techniques typically been applied to?
2. How effective have they been in solving these problems?
3. What sorts of problems are not appropriate for Bayesian methodologies?
4. How should critical rationalism be interpreted in the light of Bayesian methodology?
5. How should Bayesian methodology be interpreted in the light of critical rationalism?
The essence of the Bayesian approach is the use of mathematical rules to indicate how one should change one’s existing beliefs in the light of new evidence. By way of introduction, to highlight some core conceptual and logical issues, it may be instructive to look at a very simplistic example of the application of Bayesian inference modified from an article in the Economist 30/09/2000.
Imagine an infant who observes its first sunset; this we presume modifies its prior (apriori but not apriori valid) expectation of say a pattern of even light. We should also recognize that infants have complex neurological systems and are obviously not born as blank slates as even their DNA is a collection of expectations. Our model Bayesian infant conjectures that the sun may or may not rise again and assigns equal probabilities to the sun rising or failing to rise. Observations are represented by putting a metaphorical white marble or a black marble into a bag. Each time the sun rises another white marble is put into the bag. Thus day by day the probability that a marble plucked randomly from the bag will be white rises and this is interpreted, through Bayes theorem, as an increasing degree of belief that the sun will continue to rise until the probability becomes so great it is interpreted as a near certainty.
“Bayesian inference” therefore refers to the use of a prior probability of an hypotheses to determine the likelihood of a particular hypothesis given some observed evidence. That is, the likelihood that a particular hypothesis (the so-called posterior probability of the hypothesis) is true given some observed evidence comes from a combination of the inherent likelihood (or prior probability) of the hypothesis and the compatibility of the observed evidence with the hypothesis (or likelihood of the evidence, in a technical sense). Bayesian inference is opposed to frequentist inference, which makes use only of the likelihood of the evidence (in the technical sense), discounting the prior probability of the hypothesis.
What has happened in the sample infant calculation? Prior probabilities (guesses) have been evaluated and updated. Is the connection between truth and probable or possible truth any firmer than that between truth and guesses? Is Bayes rule an example of rational decision making or a refined game of chance?
We know that humans and all other multicellular animals and plants, algae, protozoa, bacteria and fungi are perpetual risk takers. Most of the time life forms do not display conscious intent but the metabolic and if present, neurological pathways, have apriori expectations of the world. When herds of antelope congregate and surge forward on seasonal migrations they are engaging on gambles of uncertain outcomes with potential success for a proportion of them if the historical record is a guide. A lot if not most of our conscious activity as humans is conjectural – there must be some mechanism for producing guesses and they must often succeed otherwise we also as a species would be extinct.
From a critical rationalist perspective the theory that the sun will rise, even if generated by a Bayesian form of inference, is still a conjecture. It is always possible that some cataclysm will cause the sun not to rise. The theory that the sun will rise may be a better theory than it not rising, but has the Bayesian calculation supported this conclusion at a logical level?
An inductive argument would have the form
The sun has been seen to come up each time I observed it
Therefore the sun always comes up
In this form the statement is but a guess not better or worse than other types of conjectures. As Mark Notturno says, Popper calls a guess a ‘guess’, but inductivists prefer to call a guess ‘the conclusion of an inductive argument’. Universal propositions do not follow logically from a limited set of existentials. It is the fallacy of affirming the consequent as the conclusion of an inductive argument may be false even if all its premises are true. Induction is a perception of relations and at best it represents the guessing and probing of a mind aiming at understanding.
Is there any more support for the inductive supposition to state it in terms of probabilities?
The sun has been seen to come up each time I observed it
Therefore the sun will probably always come up
Note that samples of past occurrences of the sun rising are observations, not inductions. David Deutsch pointed out in “The Beginning of Infinity” if one were to sample calendars throughout the twentieth century each of the years would have started with 19 and one would have predicted that following years would start with 19. How informative is this, what of the 21st century?
To reiterate, when claims are made for amplifying basic statements into universal statements it is equivalent to making a conjecture or a guess, not different in principle to being inspired by a dream, a song or a serendipitous flash of inspiration. Hume’s problem doesn’t apply to guesses.
The logical issue is that guesswork and conjecture sometimes resemble something called induction, but there is really no such thing (here Popper is not not talking about mathematical “induction” for which the proof is deductive anyway). You cannot deduce (or induce) from basic observation statements factual information which goes beyond the factual information contained in the basic observation statements themselves. There is is, in principle, nothing wrong with guessing, but resistance should be offered against placing a logical scaffold around generalising from individual or sampled observations. Popper would say that all perception is modified anticipation. The observer is not a blank slate.
The observer expects to see something.
Something is not observed or is observed differently
Therefore the initial expectation has been wrong.
Thus the brain and visual system reformulates a new expectation (or hypothesis).
The whole debate around the word induction is often at cross purposes in the literature because we use it in different ways. Did Popper define induction? In “Realism and the Aim of Science” (1983) p 147 he stated: “By induction I mean an argument which, given some empirical (singular or particular) premises, leads to a universal conclusion, a universal theory, either with logical certainty, or with probability (in the sense that this term is used in the calculus of probability).” This is what he rejected.
I must add that I am not stating that inference from Bayes’ theorem is inductive, although it is frequently held to be so in the literature, thus my effort above to clarify some issues around induction. Bayesianism is, according to Gillies (Phil Sc 20th C 1993) indeed a theory of justification, not of discovery. Despite the English title of Logic der Forschung (1935) being The Logic of Scientific Discovery (1959) Popper’s view is that there is no such thing as the logic of scientific discovery but only a logic of testing. Discovery and justification are separate issues. Bayesians seek to justify scientific generalisations or predictions, by showing that, although they are not certain, they can nevertheless be shown to be probable, given the evidence used to support them.
At issue is whether Popper’s conjecture and refutation can be accommodated in Bayesian methodology i.e. can there be a logical basis of the application of Bayes theorem for eliminating false conjectures? Failing to be falsified cannot from a critical rationalist perspective produce a positive logical reason for accepting conjectures although it is valid to compare conjectures based on factors other than falsifiability e.g. depth, comprehensiveness, simplicity, unifying power, consistency with background knowledge, relevance to multiple problem situations, being part of a rigorous research program, without drifting down the slippery slope of induction.
The standard Bayes theorem is:
posterior probability given evidence = ( likelihood of observing evidence given the hypothesis) x (prior probability before observing evidence ) / probability of model evidence
P(H¦E) = P(E¦H) P(H) / P(E)
P(H¦E) = The probability of a hypothesis ,“H”, given an item of evidence “E”
P(E¦H) = The probability of the evidence given the hypothesis
P(H) = The probability of the hypothesis before considering the item of evidence (the “prior probability”)
P(E) = the probability of the evidence arising (without direct reference to the hypothesis)
Bayes Theorem in itself is derivable from a simple application of probability theory. It is a non-controversial mathematical theorem. Bayesianism is more controversial, it makes questionable claims about rational belief, evidence and confirmation. Bayesianism, as David Deutsch says, assumes that minds work by assigning probabilities to their ideas and modifying those probabilities in the light of experience as a way of choosing how to act, i.e. values that are rewarded by experience are reinforced and come to dominate behaviour while those that are punished by experience are extinguished. It may be appropriate to use Bayes Theorem in computer programming but the epistemological extension reeks of behaviourism and leads astray modelling of artificial generalised intelligence.
Before looking closer at the basis of Bayesian inference, a look at how it has been used may give greater context.
Sharon Bertsch McGrayne in The Theory that would not Die: How Bayes Rule Cracked the Enigma Code explores the theorem and illustrates situations where it has had success. A list of her examples and others follows.
Alan Turing and others used modified Bayes rule to crack the Enigma code and to detect U Boats during the Second World War. McGrayne states that Baye’s rule was good for hedging bets when there were prior guesses and decisions to be made with a minimum of time or cost.
Bayesian techniques have been used to determine the most probable causes of diseases like lung cancer when prior data is fed in.
They have been used to determine the likelihood of a nuclear accident.
They have been used to ratify who wrote The Federalist Papers, a minor American history puzzle from vast amounts of written archives.
They have been used to predict results of elections from polling data. An example of this is the spectacular success of Nate Silver using Bayesian techniques to predict the results of the November 2012 American presidential election. Silver’s approach involved taking public poll data from several sources, weighting it depending on things like recency and sampling, then making statistical adjustments, mix in extra data and use this data to simulate 100,000 fake elections to spit out each candidate’s probability of victory.
They were used to help narrow the search for the H-bomb lost in the ocean off the coast of Spain.
They have been used in wildlife population studies.
They have been used in military tracking, weapons systems, anti-terrorism.
They are used in spam filters, handwriting recognition, analysis of neural networks. Bayesian spam filtering is a very powerful technique for dealing with spam, that can tailor itself to the email needs of individual users, and gives low false positive spam detection rates that are generally acceptable to users.
It has been used in analysing mammogram statistics and breast cancer prediction.
The effectiveness and economy in using such Bayesian techniques has been enthusiastically documented in numerous publications.
It does seem to be used for helping to solve problems that may involve large amounts of data but which are perhaps narrowly focused in terms of the aimed for outcomes and prior assumptions which lends weight to the “normal” science analogy.
Subjective Bayesian methodology seems to produce ampliative conclusions (posteriors). If these are seen as conjectures with no logical backing (induction), this would imply that the reason Bayesian inference seems to work is not the reason for it working. Bayesian inference in that case is Humean irrationalism. It would rest on optimism rather than logic. Karl Popper in Objective Knowledge: an evolutionary approach 1972 page 141 stated, and I quote at length:
“Nowhere has the subjectivist epistemology a stronger hold than in the field of the calculus of probability. This calculus is a generalization of Boolean algebra (and thus of the logic of propositions). It is still widely interpreted in a subjective sense, as a calculus of ignorance, or of uncertain subjective knowledge; but this amounts to interpreting Boolean algebra, including the calculus of propositions, as a calculus of certain knowledge—of certain knowledge in the subjective sense. This is a consequence which few Bayesians (as the adherents of the subjective interpretation of the probability calculus now call themselves) will cherish.
This subjective interpretation of the probability calculus I have combated for thirty-three years. Fundamentally, it springs from the same epistemic philosophy which attributes to the statement ‘I know that snow is white’ a greater epistemic dignity than to the statement ‘snow is white’.
I do not see any reason why we should not attribute still greater epistemic dignity to the statement ‘In the light of all the evidence available to me I believe that it is rational to believe that snow is white.’ The same could be done, of course, with probability statements.”
The critical rationalist philosopher, David Miller, made the point that Bayesians are not supposed to be inductivists. He continued, a true Bayesian would not be interested in whether a theory is supported or not as that would be inductive. Bayesians do not have opinions or belief, only degrees of belief.
Stephen Senn states that to have a prior distribution about the probability of success is to have a prior distribution about the probability of any sequence of successes and failures. One simply notes which sequences to strike out as result of any experience gained and renormalises the probabilities accordingly. No induction takes place. Instead probabilities resulting from any earlier probability statements regarding sequences are deduced coherently. He notes that, contrary to what some might suppose, Bruno de Finetti, one of the developers of subjective interpretation of Bayes theorem, and Popper do not disagree regarding induction. They both think that induction in the naïve Bayesian sense is a fallacy. They disagree regarding the interpretation of probability. Even though the inference is coming from evidence, it is still OK by Popper because it is possible that the evidence could have shown the theory is false. Senn continues, this leaves applied Bayesian analysis as currently practiced as one amongst a number of rough and ready tools that we have for looking at data. I think we need many such tools because we need mental conﬂict as much as mental coherence to spur us to creative thinking. When different systems give different answers it is a sign that we need to dig deeper (Senn 2003).
Andrew Gelman in “Philosophy and the Practice of Bayesian Statistics in the Social Sciences” (2010) states that he holds fears that a philosophy of Bayesian statistics as subjective, inductive inference can encourage a complacency about picking or averaging over existing models rather than trying to falsify and go further. Likelihood and Bayesian inference are powerful, and with great power comes great responsibility. Complex models can and should be checked and falsified. Again he felt that there may be a way to accommodate such a tool within the hypothetical-deductive and valid world of Karl Popper. “The main point we disagree with many Bayesians is that we do not think that Bayesian methods are generally useful for giving a posterior probability that a model is TRUE, or the PROBABILITY for preferring model A over model B, or whatever. Bayesian inference is good for deductive inference within a model, but for evaluating a model, we prefer to compare it to data without requiring that a new model be there to beat it”. He continues:” Yes, we ‘learn’, in a short-term sense, from Bayesian inference – updating the prior to get the posterior – but this is more along the lines of what a Kuhnian might call ‘normal science’. The real learning comes in the model checking stage, when we can reject the model and move forward. The inference is a necessary stage in this process, however, as it creates the strong conclusions that are falsifiable”. I wonder how Bayesian inference would have progressed Ptolemaic astronomy and Newtonian cosmology. Would Bayesianism have produced the insights of Copernicus or Einstein?
Ivor Grattan-Guinness in “Corroborations and Criticisms: Forays with the Philosophy of Karl Popper” (2010) makes the point that in Popper’s view science is a risk-taking enterprise, where theories are formed and tested as severely as possible. Science and technology have a very close relationship, and yet technology requires reliability in the performance of its product. Thus science is risk and technology is safety – a paradox of which the resolution requires careful attention to be paid to corroborations. Reliability theory is a wide-ranging subject, that takes due note of unreliability i.e. failures in technology which involve falsifications of theories. Examples are the rapid collapse of the World Trade Centre buildings and the sinking of the Titanic.
Grattan-Guinness also uses a novel descriptor, desimplification, more or less as a synonym for Popper’s “ad hoc” hypotheses. He sees desimplification as a way of describing aspects of Kuhn’s theory of normal science i.e. coping with small or not so small effects, extending the detailing, applicability of theories to special cases, checking on the size an effect of omitted factors. While such normal science is routine to a fault it can involve the creation of difficult new theories and experimental techniques. I suspect that the application of Bayes Theorem could be construed within the parameters of normal science. When moves to desimplify are patently unsuccessful, grossly contrived, or impossible a more radical kind of theory is required.
The Bayesian approach, as Gilboa et al points out, begins with priors, and models a limited form of learning, namely Bayesian updating. It does not illuminate the formation of the priors. They argue that rationality requires more than behaviour that is consistent with a Bayesian prior.The first tenet of Bayesianism is that whenever a fact is not known one should have probabilistic beliefs about it. In the light of new information, the Bayesian prior should be updated to a posterior according to Bayes theorem. When facing a decision problem one should incorporate all the information one has gathered with respect to one’s Bayesian beliefs. Bayesian inference has been useful within a limited range of expectations – very useful, but science is about explanations, problem solving. Its goal is true explanations, even if there is no logical basis to prove that we have achieved the goal of discovering unambiguous truth. Critical rationalism has offered falsifiability as the demarcation between science and pseudo-science. If Bayesian methodologies have value it is that of providing economical resources in the conjectural process – no technique can provide positive proof. Remember that even Immanuel Kant thought that Newton’s views on time, space and causality were incontrovertible, indeed a high probability that Newtonianism was correct had little bearing on the truth. How many puzzles were worked on within that paradigm?
Elliott Sober in his essay “Bayesianism its scope and limits” (2002) says that Bayesianism cannot be the whole story about scientific inference. Likelihoods don’t tell you what to believe or how to act or which hypotheses are probably true; they merely tell you how to compare the degree to which evidence supports the various hypotheses you wish to consider. Of course what logical backing such “support” offers is the issue.
Darrell Rowbottom emphasizes that our apparent ability to reach a considered consensus on evaluation of P(e,hb) and P(e,b), as against P(h,b), might nevertheless fail to be of any deep epistemological significance. It is perfectly possible to reject a more verisimilar option in favour of a less verisimilar option in testing. We might move further from the truth rather than be moving closer to it. The weeding out of false theories does not guarantee that we are moving to true ones.
Karl Popper reminds us in “Truth and Approximation to Truth” (1960)
Twice two equals four:’tis true
But too empty, and too trite.
What I look for is a clue
To some matters not so light.
Only if it is an answer to a problem – a difficult, a fertile problem, a problem of some depth – does a truth, or a conjecture about the truth, become relevant to science. Experience is indeed essential to science, but its role is different to that supposed by empiricism. It is not the source from which theories are derived. Its main use is to choose between theories that have already been guessed. That is what “learning from experience” is (Deutsch, D Beginning of Infinity).