In preparation for the release of the sequel to Dr. Richard Carrier’s book on the so-called “historical Jesus” and historical methods, I thought it would be good to address the foundations of the sequel as presented in its forerunner Proving History. Namely, Dr. Carrier’s arguments about “Bayes’ Theorem” or BT (as presented in his book, anyway). There are several reasons for this. First, Dr. Carrier argues that BT should be THE method for all historical research. Second, the entire point of Proving History was to argue that the historical methodology we will find in the sequel is the best. The third and most important reason, however, is that virtually every claim Dr. Carrier makes about BT is rather fundamentally inaccurate. Of course, Dr. Carrier clearly simplifies for his readers, and clearly knows more about the subjects and issues he deals with (or touches upon) than he includes in his book. As I intend to demonstrate, though, whether or not Dr. Carrier is simplifying doesn’t matter- his methods are not sound. Hence the title: using these methods, one is just as likely to “prove” some historical account as one is fantastical idiocy, all depending upon how one misuses Bayesian inference.
We can begin by looking at Dr. Carrier’s “proof” about BT (beginning on p. 106), which starts as follows:
“P[remise] 1. BT is a logically proven theorem.
P2. “No argument is valid that contradicts a logically proven theorem.
C[onclusion] 1. Therefore, no argument is valid that contradicts BT.”
Granting the truth of this (it’s false, but this isn’t the time to get into the difference between a sound and a valid argument), then the only historian I know of who actually gives arguments that contradict BT is Dr. Carrier. To understand why, let’s look at Dr. Carrier’s own sources.
On p. 50 Dr. Carrier refers the reader via an endnote (no. 9) to “several highly commendable texts” on BT. The one he states gives “a complete proof of the formal validity of BT” is Papoulis, A. (1986). Probability, Random Variables, and Stochastic Processes. (2nd Ed.). I don’t have the 2nd edition, but I do have the 3rd and as this proof is trivial I really could use any intro probability textbook. Papoulis begins his “complete proof of the formal validity” (as opposed to proof of informal validity? or incomplete proof?) by defining a set and probability function for which the axioms of probability hold. A key axiom is that any set of possible outcomes must sum or integrate to 1 (simplistically, for those who haven’t taken any calculus, integration is a kind of summation). For example, imagine an individual named “Anna” is drawing cards from a pack. Lets imagine that
“It wasn’t the Jack of Diamonds
Nor the Joker she drew at first
It wasn’t the King or the Queen of Hearts
But the Ace of Spades reversed”
The probability of drawing the card she did is 1/53 (it includes the joker). This is true for the other 52 cards as well. The probability that she would draw a card from the deck that was in the deck is 53/53 or 1. For a “regular” deck, it would be 52/52 or 1 (no joker). This is intuitive and obvious, but the important point is that it also follows from the fact that a normal deck has 52 cards and the probability for drawing any one of them is 1/52, hence the probability of drawing a card is given by the sum of the probabilities of drawing each individual card, or 1/52 summed 52 times. In Dr. Carrier’s appendix (p. 284), says of something related to probabilities that they “must sum to 1″, just like the possible outcomes of drawing cards from a deck do (52/52=1). What he apparently doesn’t understand is why they “must” do so or what this entails. It means that in order to use BT to evaluate how probable some outcome, result, historical event, etc., is, one must consider every single one.
Dr. Carrier wishes to use BT to evaluate the probability that particular events occurred ~2,000 years ago. For example, on pp. 240-42 he considers the possibility that Jesus was a “legendary rabbi” in terms of the “class” of legendary rabbis and information we have on such a class. Do we know how many such rabbis existed and who they were (the way I know how many cards there are in a deck as well as the “name” of each, e.g., “ace of spades”)? No. Ergo, Bayes’ Theorem is unusable.
There is another basic property of BT Dr. Carrier seems to have missed. As Papoulis clearly states, BT is only valid for events/outcomes that are mutually exclusive. A simple example of mutual exclusivity is the coin toss. If I toss a coin, the probability that I will get heads and the probability that I will get tails are “exclusive” because I cannot get BOTH heads AND tails given one toss. Let’s say, however, that I’m picking students from the total population of juniors at Tunafish Technical Institute for the Lavatory Sciences, and I know that of the 100 juniors, 15 are taking Math 205 and 35 are taking Engineering 211. However, the total number of students in either course is only 40, because 10 students are in both classes. The two groups are NOT mutually exclusive.
Often, both of these requirements (“must sum to 1” and mutual exclusivity) are given together: the set of outcomes must be collectively exhaustive and mutually exclusive, or BT can only be used if
1) all possible outcomes are known
2) one and only one outcome of this set of all possible outcomes can occur.
This makes BT useless for most purposes, including historiography. However, Dr. Carrier isn’t really using BT. As his references show (as well as his description of BT throughout his book), he is actually using something called Bayesian inference/Bayesian analysis. However, this renders almost completely irrelevant every conclusion in his “proof” about BT, because he isn’t using it. Thus, by misusing and inaccurately describing what BT is, he comes about as close to contradicting it as a historian can. Also, because he conflates BT with Bayesian inference/analysis, it doesn’t matter if “BT is a logically proven theorem” as he isn’t using BT. Finally, there is no “complete proof of formal validity” for some Bayesian inference/analysis “theorem” Dr. Carrier could use in place of the first premise in his proof.
Ok, so we can’t use BT, but that doesn’t mean we can’t use Bayesian methods. However, in order to use Bayesian methods historians would have to understand Bayesian statistics (and statistics in general). Only it doesn’t even appear that Dr. Carrier understands enough to do so. We can see this when Dr. Carrier addresses the “frequentist vs. Bayesian” debate. To keep things simple, let’s just say that this is an ongoing debate arguably going back to Thomas Bayes but which is definitely over a century old. Dr. Carrier is apparently so confident in his mathematical acuity he “resolves” the dispute with almost no reference to math or the literature in a few pages: “The whole debate between frequentists and Bayesians, therefore, has merely been about what a probability is a frequency of, and that is a rather pointless disagreement, since a frequency is a frequency, the rules are the same for either…” (p. 266). Hm. Amazing that generations of the best statistical minds missed this. Oh wait. They didn’t.
Let’s look at how Dr. Carrier describes the dispute: “The debate between the so-called ‘frequentists’ and ‘Bayesians’ can be summarized thus: frequentists describe probabilities as a measure of the frequency of occurrence of particular kinds of event within a given set of events, while Bayesians often describe probabilities as measuring degrees of belief or uncertainthy.” (p. 265). We must grant Dr. Carrier the requisite lenience given his necessary simplifications, because were this really his view it would be laughably wrong:
“Frequentist statistical procedures are mainly distinguished by two related features; (i) they regard the information provided by the data x as the sole quantifiable form of relevant probabilistic information and (ii) they use, as a basis for both the construction and the assessment of statistical procedures, long-run frequency behaviour under hypothetical repetition of similar circumstances.”
Bernardo, J. M. & Smith, A. F. (1994). Bayesian Theory. Wiley.
“Undoubtedly, the most critical and most criticized point of Bayesian analysis deals with the choice of the prior distribution, since, once this prior distribution is known, inference can be led in an almost mechanic way by minimizing posterior losses, computing higher posterior density regions, or integrating out parameters to find the predictive distribution. The prior distribution is the key to Bayesian inference and its determination is therefore the most important step in drawing this inference. To some extent, it is also the most difficult. Indeed, in practice, it seldom occurs that the available prior information is precise enough to lead to an exact determination of the prior distribution, in the sense that many probability distributions are compatible with this information…Most often, it is then necessary to make a (partly) arbitrary choice of the prior distribution, which can drastically alter the subsequent inference.”
Robert, C. P. (2001). The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation (Springer Texts in Statistics). (2nd Ed.). Springer.
The “frequency” part of “frequentist” does have to do with kinds of events, but frequencies are the measure of probability, not the reverse. To illustrate, consider the “bell curve” (the graph of the normal distribution). It’s a probability distribution. Now imagine a standardized test like the SATs which is designed such that scores will be normally distributed and have this bell curve graph. The bell curve is the graph of a probability function (technically, of a probability density function or pdf), and it is formed by the frequency of particular scores. We know that it is very improbable for a person’s score to be perfect or near perfect, and the graph shows this because the right-hand end is nearly flat, indicating scores close to perfect are very infrequent outcomes.
What does this mean for “frequentist” methods? Well, Kaplan, The Princeton Review, and other test prep companies try to show their methods work by using this normal distribution. They claim that people who take their classes aren’t distributed the way the population is, because too frequently students taking their class obtain scores above average (i.e., those who take the classes have test scores that aren’t distributed the way the population is). They use the frequency of higher-than-average scores to argue that their class must improve scores.
What’s key is that the data are obtained and analyzed but the distribution is only used to determine whether the values the analysis yielded are “statistically significant”. Bayesian inference reverses this, creating fundamental differences. The process starts with a probability distribution. The prior distributions obtained represent uncertainty and make predictions about the data that will be obtained. Once the new data is obtained, the model is adjusted to better fit it. This is usually done many, many times as more and more information is tested against an increasingly more accurate model. The key differences are
1) the iterative process
2) the use of models which make predictions
3) the use of distributions to represent unknowns and (in part) the way the model will “learn” or adapt given new input.
So why don’t we find any of this in Dr. Carrier’s description of Bayesian methods? Why do we always find ad hoc descriptions of “priors”? Because Dr. Carrier wants to use Bayesian analysis but apparently doesn’t understand what “priors” actually are or how complicated they can be in even simple models:
“In many situations, however, the selection of the prior distribution is quite delicate in the absence of reliable prior information, and generic solutions must be chosen instead. Since the choice of the prior distribution has a considerable influence on the resulting inference, this choice must be conducted with the utmost care.”
Marin, J. M., & Robert, C. (2007). Bayesian Core: A Practical Approach to Computational Bayesian Statistics. (Springer Texts in Statistics). Springer.
“While the axiomatic development of Bayesian inference may appear to provide a solid foundation on which to build a theory of inference, it is not without its problems. Suppose, for example, a stubborn and ill-informed Bayesian puts a prior on a population proportion p that is clearly terrible (to all but the Bayesian himself). The Bayesian will be acting perfectly logically (under squared error loss) by proposing his posterior mean, based on a modest size sample, as the appropriate estimate of p. This is no doubt the greatest worry that the frequentist (as well as the world at large) would have about Bayesian inference — that the use of a “bad prior” will lead to poor posterior inference. This concern is perfectly justifiable and is a fact of life with which Bayesians must contend…We have discussed other issues, such as the occasional inadmissibility of the traditional or favored frequentist method and the fact that frequentist methods don’t have any real, compelling logical foundation. We have noted that the specification of a prior distribution, be it through introspection or elicitation, is a difficult and imprecise process, especially in multiparameter problems, and in any statistical problem, suffers from the potential of yielding poor inferences as a result of poor prior modeling.”
Samaniego, F. J. (2010). A Comparison of the Bayesian and Frequentist Approaches to Estimation. (Springer Texts in Statistics). Springer.
The “stubborn and ill-informed Bayesian” is in a much better position than Dr. Carrier. Dr. Carrier has conflated BT with Bayesian analysis and mischaracterized the distinctions between the Bayesian and frequentist approaches. Instead of prior distributions his “priors” are best guesses. Instead of real belief functions we find “here’s what I believe”. No considerations are given to the nature of the data (categorical, nominal, and in general non-numerical data require specific models and tests, Bayesian or not). Whether all this is due to simplification for his audience or no, it cannot serve as a foundation for any sound methods.
So instead of the universally valid historical method Dr. Carrier argues BT provides, he’s described a method impossible for any historian to use as a means to come to any sound conclusions about anything. Instead, he offers only a mathematical formula they can plug values into and get a result as logically sound as is numerology, although it apparently seems impressive if one has no clue what one is talking about. Perhaps the sequel, which (one presumes) must rely on sound uses of any method, will offer more than Proving History does.