The Duhem Problem: The Bayesian Turn

CHAPTER 3 of my thesis Aspects of the Duhem Problem.

The previous chapter concluded with an account of the attempt by Lakatos to retrieve the salient features of falsificationism while accounting for the fact that a research programme may proceed in the face of numerous difficulties, just provided that there is occasional success. His methodology exploits the ambiguity of refutation (the Duhem-Quine problem) to permit a programme to proceed despite seemingly adverse evidence. According to a strict or naive interpretation of falsificationism, adverse evidence should cause the offending theory to be ditched forthwith but of course the point of the Duhem-Quine problem is that we do not know which among the major theory and auxiliary assumptions is at fault. The Lakatos scheme also exploits what is claimed to be an asymmetry in the impact of confirmations and refutations.

The Bayesians offer an explanation and a justification for Lakatos; at the same time they offer a possible solution to the Duhem-Quine problem. The Bayesian enterprise did not set out specifically to solve these problems because Bayesianism offers a comprehensive theory of scientific reasoning. However these are the kind of problems that such a comprehensive theory would be required to solve.

Howson and Ubrach, well-regarded and influential exponents of the Bayesian approach, provide an excellent all-round exposition and spirited polemics in defence of the Bayesian system in Scientific Reasoning: The Bayesian Approach (1989). In a nutshell, Bayesianism takes its point of departure from the fact that scientists tend to have degrees of belief in their theories and these degrees of belief obey the probability calculus. Or if their degrees of belief do not obey the calculus, then they should, in order to achieve rationality. According to Howson and Urbach probabilities should be ‘understood as subjective assessments of credibility, regulated by the requirements that they be overall consistent (ibid 39).

They begin with some comments on the history of probability theory, starting with the Classical Theory, pioneered by Laplace. The classical theory aimed to provide a foundation for gamblers in their calculations of odds in betting, and also for philosophers and scientists to establish grounds of belief in the validity of inductive inference. The seminal book by Laplace was Philosophical Essays on Probabilities (1820) and the leading modern exponents of the Classical Theory have been Keynes and Carnap.

Objectivity is an important feature of the probabilities in the classical theory. They arise from a mathematical relationship between propositions and evidence, hence they are not supposed to depend on any subjective element of appraisal or perception. Carnap’s quest for a principle of induction to establish the objective probability of scientific laws foundered on the fact that these laws had to be universal statements, applicable to an infinite domain. Thus no finite body of evidence could ever raise the probability of a law above zero (e divided by infinity is zero).

The Bayesian scheme does not depend on the estimation of objective probabilities in the first instance. The Bayesians start with the probabilities that are assigned to theories by scientists. There is a serious bone of contention among the Bayesians regarding the way that probabilities are assigned, whether they are a matter of subjective belief as argued by Howson and Urbach ( ‘belief’ Bayesians’) or a matter of behaviour, specifically betting behaviour (‘betting’ Bayesians).

The purpose of the Bayesian system is to explain the characteristic features of scientific inference in terms of the probabilites of the various rival hypotheses under consideration, relative to the available evidence, in particular the most recent evidence.


Bayes’s Theorem can be written as follows:

P(h!e) = P(e!h)P(h) where P(h), and P(e) > 0

In this situation we are interested in the credibility of the hypothesis h relative to empirical evidence e. That is, the posterior probability, in the light of the evidence. Written in the above form the theorem states that the probability of the hypothesis conditional on the evidence (the posterior probability of the hypothesis) is equal to the probability of the evidence conditional on the hypothesis multiplied by the probability of the hypothesis in the absence of the evidence (the prior probability), all divided by the probability of the evidence.


e confirms or supports h when P(h!e) > P(h)
e disconfirms or undermines h when P(h!e) < P(h)
e is neutral with respect to h when P(h!e) = P(h)

The prior probability of h, designated as P(h) is that before e is considered. This will often be before e is available, but the system is still supposed to work when the evidence is in hand. In this case it has to be left out of account in evaluating the prior probability of the hypothesis. The posterior probability P(h!e) is that after e is admitted into consideration.

As Bayes’s Theorem shows, we can relate the posterior probability of a hypothesis to the terms P(h), P(e!h) and P(e). If we know the value of these three terms we can determine whether e confirms h, and more to the point, calculate P(h!e).

The capacity of the Bayesian scheme to provide a solution to the Duhem-Quine problem will be appraised in the light of two examples.


Dorling (1979) provides an important case study, bearing directly on the Duhem-Quine problem in a paper titled ‘Bayesian Personalism, the Methodology of Scientific Research Programmes, and Duhem’s Problem’. He is concerned with two issues which arise from the work of Lakatos and one of these is intimately related to the Duhem-Quine problem.

1(a) Can a theory survive despite empirical refutation? How can the arrow of modus tollens be diverted from the theory to some auxiliary hypothesis? This is essentially the Duhem-Quine problem and it raises the closely related question;

1(b) Can we decide on some rational and empirical grounds whether the arrow of modus tollens should point at a (possibly) refuted theory or at (possibly) refuted auxiliaries?

2. How are we to account for the different weights that are assigned to confirmations and refutations?

In the history of physics and astronomy, successful precise quantitative predictions seem often to have been regarded as great triumphs when apparently similar unsuccessful predictions were regarded not as major disasters but as minor discrepancies. (Dorling, 1979, 177).

The case history concerns a clash between the observed acceleration of the moon and the calculated acceleration based on a hard core of Newtonian theory (T) and an essential auxiliary hypothesis (H) that the effects of tidal friction are too small to influence lunar acceleration. The aim is to evaluate T and H in the light of new and unexpected evidence (E’) which was not consistent with them.

For the situation prior to the evidence E’ Dorling ascribed a probability of 0.9 to Newtonian theory (T) and 0.6 to the auxiliary hypothesis (H). He pointed out that the precise numbers do not matter all that much; we simply had one theory that was highly regarded, with subjective probability approaching 1 and another which was plausible but not nearly so strongly held.

The next step is to calculate the impact of the new evidence E’ on the subjective probabilities of T and H. This is done by calculating (by the Bayesian calculus) their posterior probabilities (after E’) for comparison with the prior probabilities (0.9 and 0.6). One might expect that the unfavourable evidence would lower both by a similar amount, or at least a similar proportion.

Dorling explained that some other probabilities have to be assigned or calculated to feed into the Bayesian formula. Eventually we find that the probability of T has hardly shifted (down by 0.0024 to 0.8976) while in striking contrast the probability of H has collapsed by 0.597 to 0.003. According to Dorling this accords with scientific perceptions at the time and it supports the claim by Lakatos that a vigorous programme can survive refutations provided that it provides opportunities for further work and has some success. Newtonian theory would have easily survived this particular refutation because on the arithmetic its subjective probability scarcely changed.

This case is doubly valuable for the evaluation of Lakatos because by a historical accident it provided an example of a confirmation as well as a refutation. For a time it was believed that the evidence E’ supported Newton but subsequent work revealed that there had been an error in the calculations. The point is that before the error emerged, the apparent confirmation of T and H had been treated as a great triumph for the Newtonian programme. And of course we can run the Bayesian calculus, as though E’ had confirmed T and H, to find what the impact of the apparent confirmation would have been on their posterior probabilities. Their probabilities in this case increased to 0.996 and 0.964 respectively and Dorling uses this result to provide support for the claim that there is a powerfully asymmetrical effect on T between the refutation and the confirmation. He regards the decrease in P from 0.9 to 0.8976 as negligible while the increase to 0.996 represents a fall in the probability of error from 1/10 to 4/1000.

Thus the evidence has more impact in support than it has in opposition, a result from Bayes that agrees with Lakatos.

This latest result strongly suggests that a theory ought to be able to withstand a long succession of refutations of this sort, punctuated only by an occasional confirmation, and its subjective probability still steadily increase on average (Dorling, 1979, 186).

As to the relevance to Duhem-Quine problem; the task is to pick between H and T. In this instance the substantial reduction in P(H) would indicate that the H, the auxiliary hypothesis, is the weak link rather than the hard core of Newtonian theory.


The point of this example (used by Lakatos himself) is to show how a theory which appears to be refuted by evidence can survive as an active force for further development, being regarded more highly than the confounding evidence. When this happens, the Duhem-Quine problem is apparently again resolved in favour of the theory.

In 1815 William Prout suggested that hydrogen was a building block of other elements whose atomic weights were all multiples of the atomic weight of hydrogen. The fit was not exact, for example boron had a value of 0.829 when according to the theory it should have been 0.875 (a multiple of the figure 0.125). The measured figure for chlorine was 35.83 instead of 36. To overcome these discrepancies Prout and Thompson suggested that the values should be adjusted to fit the theory, with the deviations explained in terms of experimental error. In this case the ‘arrow’ of modus tollens was directed from the theory to the experimental techniques.

In setting the scene for use of Bayesian theory, Howson and Urbach designated Prout’s hypothesis as ‘t’. They refer to ‘a’ as the hypothesis that the accuracy of measurements was adequate to produce an exact figure. The troublesome evidence is labelled ‘e’.

It seems that chemists of the early nineteenth century, such as Prout and Thompson, were fairly certain about the truth of t, but less so of a, though more sure that a is true than that it is false. (ibid, page 98)

In other words they were reasonably happy with their methods and the purity of their chemicals while accepting that they were not perfect.

Feeding in various estimates of the relevant prior probabilities, the effect was to shift from the prior probabilities to the posterior probabilities listed as follows:

P(t) = 0.9 shifted to P(t!e) = 0.878 (down 0.022)
P(a) = 0.6 shifted to P(a!e) = 0.073 (down 0.527)

Howson and Urbach argued that these results explain why it was rational for Prout and Thomson to persist with Prout’s hypothesis and to adjust atomic weight measurements to come into line with it. In other words, the arrow of modus tollens is validly directed to a and not t.

Howson and Urbach noted that the results are robust and are not seriously affected by altered initial probabilities: for example if P(t) is changed from 0.9 to 0.7 the posterior probabilities of t and a are 0.65 and 0.21 respectively, still ranking t well above a (though only by a factor of 3 rather than a factor of 10).

In the light of the calculation they noted ‘Prouts hypothesis is still more likely to be true than false, and the auxiliary assumptions are still much more likely to be false than true’ (ibid 101). Their use of language was a little unfortunate because we now know that Prout was wrong and so Howson and Urbach would have done better to speak of ‘credibility’ or ‘likelihood’ instead of truth. Indeed, as will be explained, there were dissenting voices at the time.


Bayesian theory has many admirers, none more so than Howson and Urbach. In their view, the Bayesian approach should become dominant in the philosophy of science, and it should be taken on board by scientists as well. Confronted with evidence from research by Kahneman and Tversky that ‘in his evaluation of evidence, man is apparently not a conservative Bayesian: he is not a Bayesian at all’ (Kahneman and Tversky, 1972, cited in Howson and Urbach, 1989, 293) they reply that:

…it is not prejudicial to the conjecture that what we ourselves take to be correct inductive reasoning is Bayesian in character that there should be observable and sometimes systematic deviations from Bayesian precepts…we should be surprised if on every occasion subjects were apparently to employ impeccable Bayesian reasoning, even in the circumstances that they themselves were to regard Bayesian procedures as canonical. It is, after all, human to err. (Howson and Urbach, 1989, 293-285)

They draw some consolation from the lamentable performance of undergraduates (and a distressing fraction of logicians) in a simple deductive task (page 294). The task is to nominate which of four cards should be turned over to test the statement ‘if a card has a vowel on one side, then it has an even number on the other side’. The visible faces of the four cards are ‘E’, ‘K’, ‘4’ and ‘7’. The most common answers are the pair ‘E’ and ‘4’ or ‘4’ alone. The correct answer is e and 7.

The Bayesian approach has some features that give offence to many people. Some object to the subjective elements, some to the arithmetic and some to the concept of probability which was so tarnished by the debacle of Carnap’s programme.

Taking the last point first, Howson and Urbach argue cogently that the Bayesian approach should not be subjected to prejudice due to the failure of the classical theory of objective probabilities. The distinctively subjective starting point for the Bayesian calculus of course raises the objection of excessive subjectivism, with the possibility of irrational or arbitrary judgements. To this, Howson and Urbach reply that the structure of argument and calculation that follows after the assignment of prior probabilities resembles the objectivity of deductive inference (including mathematical calculation) from a set of premises. The source of the premises does not detract from the objectivity of the subsequent manipulations that may be performed upon them. Thus Bayesian subjectivism is not inherently more subjective than deductive reasoning.


The input consists of prior probabilities (whether beliefs or betting propensities) and this raises another objection, along the lines that the Bayesians emerge with a conclusion (the posterior probability) which overwhelmingly reflects what was fed in, namely the prior probability. Against this is the argument that the prior probability (whatever it is) will shift rapidly towards a figure that reflects the impact of the evidence. Thus any arbitrariness or eccentricity of original beliefs will be rapidly corrected in a ‘rational’ manner. The same mechanisms is supposed to result in rapid convergence between the belief values of different scientists.

To stand up, this latter argument must demonstrate that convergence cannot be equally rapidly achieved by non-Bayesian methods, such as offering a piece of evidence and discussing its implications for the various competing hypotheses or the alternative lines of work without recourse to Bayesian calculations.

As was noted previously, there is a considerable difference of opinion in Bayesian circles about the measure of subjective belief. Some want to use a behavioural measure (actual betting, or propensity to bet), others including Howson and Urbach opt for belief rather than behaviour. The ‘betting Bayseians’ need to answer the question – what, in scientific practice, is equivalent to betting? Is the notion of betting itself really relevant to the scientist’s situation? Betting forces a decision (or the bet does not get placed) but scientists can in principle refrain from a firm decision for ever (for good reasons or bad). This brings us back to the problems created by the demand to take a stand or make a decision one way or the other. Even if some kind of behavioural equivalent of betting is invoked, such as working on a particular programme or writing papers related to the programme, there is still the kind of problem, noted below, where a scientist works on a theory which he or she believes to be false.

Similarly formidable problems confront the ‘belief Bayesians’. Obviously any retrospective attribution of belief (as in the cases above) calls for heroic assumptions about the consciousness of people long dead. These assumptions expose the limitation with the ‘forced choice’ approach which attempts to collapse all the criteria for the decision into a single value. Such an approach (for both betting and belief Bayesians) seems to preclude a complex appraisal of the theoretical problem situation which might be based on multiple criteria. Such an appraisal might run along the lines that theory A is better than theory B in solving some problems and C is better than B on some other criteria, and so certain types of work are required to test or develop each of the rival theories. This is the kind of situation envisaged by Lakatos when he developed his methodology of scientific research programmes.

The forced choice cannot comfortably handle the situation of Maxwell who continued to work on his theories even though he knew they had been found wanting in tests. Maxwell hoped that his theory would come good in the end, despite a persisting run of unfavourable results. Yet another situation is even harder to comprehend in Bayesian terms. Consider a scientist at work on an important and well established theory which that scientist believes (and indeed hopes) to be false. The scientist is working on the theory with the specific aim of refuting it, thus achieving the fame assigned to those who in some small way change the course of scientific history. The scientist is really betting on the falsehood of that theory. These comments reinforce the value of detaching the idea of working on a theory from the need to have belief in it, as noted in the chapter on the Popperians.


What do the cases do for our appraisal of Bayesian subjectivism? The Dorling example is very impressive on both aspects of the Lakatos scheme – swallowing an anomaly and thriving on a confirmation. The case for Bayesianism (and Lakatos) is reinforced by the fact that Dorling set out to criticise Lakatos, not to praise him. And he remained critical of any attempt to sidestep refutations because he did not accept that his findings provided any justification for ignoring refutations, along the lines of ‘anything goes’.

Finally, let me emphasise that this paper is intended to attack, not to defend, the position of Lakatos, Feyerabend and some of Kuhn’s disciples with respect to its cavalier attitude to ‘refutations’. I find this attitude rationally justified only under certain stringent conditions: p(T) must be substantially greater than 1/2, the anomalous result must not be readily explainable by any plausible rival theory to T…(Dorling, 1979, 187).

In this passage Dorling possibly gives the game away. There must not be a significant rival theory that could account for the aberrant evidence E’. In the absence of a potential rival to the main theory the battle between a previously successful and wide-ranging theory in one corner (in this case Newton) and a more or less isolated hypothesis and some awkward evidence in another corner is very uneven.

For this reason, it can be argued that the Bayesian scheme lets us down when we most need help – that is, in a choice between major rival systems, a time of ‘crisis’ with clashing paradigms, or a major challenge as when general relativity emerged as a serious alternative to Newtonian mechanics. Presumably the major theories (say Newton and Einstein) would have their prior probabilities lowered by the existence of the other, and the supposed aim of the Bayesian calculus in this situation should be to swing support one way or the other on the basis of the most recent evidence. The problem would be to determine which particular piece of evidence should be applied to make the calculations. Each theory is bound to have a great deal of evidence in support and if there is recourse to a new piece of evidence which appears to favour one rather than the other (the situation with the so-called ‘crucial experiment’) then the Duhem-Quine problem arises to challenge the interpretation of the evidence, whichever way it appears to go.

A rather different approach can be used in this situation. It derives from a method of analysis of decision making which was referred to by Popper as ‘the logic of the situation’ but was replaced by talk of ‘situational analysis’ to take the emphasis off logic. So far as the Duhem-Quine problem is concerned we can hardly appeal to the logic of the situation for a resolution because it is precisely the logic of the situation that is the problem. But we can appeal to an appraisal of the situation where choices have to be made from a limited range of options.

Scientists need to work in a framework of theory. Prior to the rise of Einstein, what theory could scientists use for some hundreds of years apart from that of Newton and his followers? In the absence of a rival of comparable scope or at least significant potential there was little alternative to further elaboration of the Newtonian scheme, even if anomalies persisted or accumulated. Awkward pieces of evidence create a challenge to a ruling theory but they do not by themselves provide an alternative. The same applies to the auxiliary hypothesis on tidal friction (mentioned the first case study above), unless this happens to derive from some non-Newtonian theoretical assumptions that can be extended to rival the Newtonian scheme.

The approach by situational analysis is not hostage to any theory of probability (objective or subjective), or likelihood, or certainty or inductive proof. Nor does it need to speculate about the truth of the ruling theory, in the way that Howson and Urbach speculate about the likelihood that a theory might be true.

This brings us to the Prout example which is not nearly as impressive as the Dorling case. Howson and Urbach concluded that the Duhem-Quine problem in that instance was resolved in favour of the theory against the evidence on the basis of a high subjective probability assigned to Prout’s law by contemporary chemists. In the early stages of its career Prout’s law may have achieved wide acceptance by the scientific community, at least in England, and for this reason Howson and Urbach assigned a very high subjective probability to Prout’s hypothesis (0.9). However Continental chemists were always skeptical and by mid-century Staas (and quite likely his Continental colleagues) had concluded that the law was an illusion (Howson and Urbach, 1989, 98). This potentially damning testimony was not invoked by Howson and Urbach to reduce p(H), but it could have been (and probably should have been). Staas may well have given Prout the benefit of the doubt for some time over the experimental methodology, but as methods improved then the fit with Prout should have improved as well. Obviously the fit did not improve and under these circumstances Prout should have become less plausible, as indeed was the case outside England. If the view of Staas was widespread, then a much lower prior probability should have been used for Prout’s theory.

Another point can be made about the high prior probability assigned to the hypothesis. The calculations show that the subjective probability of the evidence sank from 0.6 to 0.073 and this turned the case in favour of the theory. But there is a flaw of logic there: presumably the whole-number atomic numbers were calculated using the same experimental equipment and the same or similar techniques that were used to estimate the atomic number of Chlorine. And the high p for Prout was based on confidence in the experimental results that were used to pose the whole-number hypothesis in the first case. The evidence that was good enough to back the Prout conjecture should have been good enough to refute it, or at least dramatically lower its probability.

In the event, Prout turned out to be wrong, even if he was on the right track in seeking fundamental building blocks. The anomalies were due to isotopes which could not be separated or detected by chemical methods. So Prout’s hypothesis may have provided a framework for ongoing work until the fundamental flaw was revealed by a major theoretical advance. As was the case with Newtonian mechanics in the light of the evidence on the acceleration of the moon, a simple-minded, pragmatic approach might have provided the same outcome without need of Bayesian calculations.

Consequently it is not true to claim, with Howson and Urbach that “…the Bayesian model is essentially correct. By contrast, non-probabilistic theories seem to lack entirely the resources that could deal with Duhem’s problem” (Howson and Urbach, 1989, 101).


It appears that the Bayesian scheme has revealed a great deal of power in the Dorling example but is quite unimpressive in the Prout example. The requirement that there should not be a major rival theory on the scene is a great disadvantage because at other times there is little option but to keep working on the theory under challenge, even if some anomalies persist. Where the serious option exists it appears that the Bayesians do not help us to make a choice.

Furthermore, internal disagreements call for solutions before the Bayesians can hope to command wider assent; perhaps the most important of these is the difference between the ‘betting’ and the ‘belief’ schools of thought in the allocation of subjective probabilities. There is also the worrying aspect of betting behaviour which is adduced as a possible way of allocating priors but, as we have seen, there is no real equivalent of betting in scientific practice. One of the shortcomings of the Bayesian approach appears to be an excessive reliance on a particular piece of evidence (the latest) whereas the Popperians and especially Lakatos make allowance for time to turn up a great deal of evidence so that preferences may slowly emerge.

This brings us to the point of considering just how evidence does emerge, a topic which has not yet been mentioned but is an essential part of the situation. The next chapter will examine a mode of thought dubbed the ‘New Experimentalism’ to take account of the dynamics of experimental programs.

This entry was posted in epistemology. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

please answer (required): * Time limit is exhausted. Please reload the CAPTCHA.