Now we get to the good stuff: Bayes’ Theorem, Bayesian inference, and probability. A key issue that runs through both On the Historicity of Jesus and Dr. Carrier’s previous book Proving History is the confusion between Bayes’ Theorem (BT) and Bayesian inference/probability/etc. The Bayesian vs. Frequentist debate is highly interesting, quite important, and far too complicated to touch on here. Luckily, both camps agree that Bayes’ Theorem “proper” is mathematically sound, and is derivable from probability axioms both sides accept. It’s actually rather mundane. To introduce it, I need to say a word about conditional probability.
More than a word on conditional probability
There are different ways that we evaluate “combined” probabilities. For example, there is independence: if I flip a fair coin and get heads, this tells me nothing about what I will get on the next flip. However, if I draw an ace from a deck of cards and don’t put it back (don’t replace it) before drawing again, I’ve changed the odds that I will get an ace (there are fewer cards in the deck and fewer aces). Conditional probability concerns outcomes or events that are…well…condition. For example, imagine I pick a student at random from the pool of graduate students currently enrolled at Columbia University. It turns out the student I picked is almost finished with her doctoral work in ancient history. I find out that her undergraduate degree was mathematics. This is so interesting that I start sampling students from Columbia University’s graduate history program to find out the probability that given the student is a graduate history student, what is the probability that the student majored in math as an undergraduate?
A cooler example: suppose the Man in Black challenges me to a battle of wits for the princess to the death. I pour us wine, and he takes out a small tube of powder saying “Inhale this but do not touch”. I smell nothing. He tells me “What you do not smell is called iocaine powder; it is tasteless, odorless, dissolves instantly in liquid, and is among the more deadly poisons known to man.” Next he takes the glasses, turns his back to do something with them so that I can’t see, and then puts them back on the table. I have to determine where the poison is, and I choose the glass that was in front of him before I switched glasses when his back was turned. I start laughing at him, thinking I’ve outwitted him, but then die. The Princess Bride (Buttercup) has been sitting with us the whole time. Knowing that I switched the glasses and yet drank the poison, she reasons that it must be because all this time it was the Man in Black’s cup that was poisoned. True, but she still needs a probability theory refresher. Given the fact that I died from being poisoned, we know the cup I drank from was poisoned. But what if I hadn’t switched the glasses but still drank the glass in front of me? It turns out given this, I still would have died. The point is that Buttercup can infer from the fact that I died from drinking the cup I did that it was poisoned, but not that this was the cup that was poisoned: both were.
From conditional probability to BT
BT is merely an extension of this kind of probability. Let A be a conditional outcome that must happen with one and only one of the outcomes B1, B2, B3,…B25. BT allows me to calculate, given that A happened, the probability that e.g., B1 happened. It’s pretty useless. More importantly, because it is derived from the standard probability axioms, we can’t use it unless we also know the exact probability for each element in the set B AND the probability of each element in this set GIVEN A.
From BT to Bayesianism
This is not something that we can do with Dr. Carrier’s hypotheses, which are not mutually exclusive, collectively exhaustive, correspond to known independent (not to mention conditional) probabilities, or even single events. Of course, Dr. Carrier isn’t actually using BT. Bayesian inference (and Bayesian statistics) are not new, but thanks largely to Fisher’s distaste for inverse probability and BT, until fairly recently few used Bayesian methods; everybody relied on the frequentist interpretation of probability (a fair coin has a probability 1/2 of landing on heads because if you flipped the fair coin an infinite number of times the distribution would be .5 heads and .5 tails). It is possible to recognize the efficacy of Bayesian statistics without having a Bayesian (or subjective) interpretation of probability. In fact, it’s hard not to. But this is for reasons that are unnecessarily complicated, almost entirely irrelevant, and involve the ways in which such inferences are used where frequentist probability isn’t particularly suited (like machine learning). Even the most ardent Bayesian wouldn’t agree that Carrier even COULD use Bayesian inference/probability to evaluate the probability other than as merely a formal expression of his opinion (his subjective evaluation of the evidence using Bayesian formalism) that Jesus existed regardless of whether this outcome was one out of a mutually exclusive and collectively exhaustive set:
“Mathematical results of Cox (1946, 1961) and Savage (1954, 1972) prove that if p(θ) and p(y|θ) represent a rational person’s beliefs, then Bayes’ rule is an optimal method of updating this person’s beliefs about θ given new information y. These results give a strong theoretical justification for the use of Bayes’ rule as a method of quantitative learning. However, in practical data analysis situations it can be hard to precisely mathematically formulate what our prior beliefs are, and so p(θ) is often chosen in a somewhat ad hoc manner or for reasons of computational convenience.”
Hoff, P. D. (2010). A first course in Bayesian statistical methods (Springer Texts in Statistics). Springer.
I quote this introductory text for two reasons: first, because it is basically impossible to prove that any method for updating whatever a “rational person’s beliefs” are given new evidence which means the author is particularly optimistic about the power of Bayesian methods, and second because despite such an optimistic outlook (which is not, by the way, to be found in a proof in Cox (1961) or Savage (1954, 1972); I don’t have Cox’s 1946 article), even here we are told that problematic data in scientific research can make using Bayesian methods pointless or worse.
On the importance of reading and understanding your own sources
Dr. Carrier cites several books on Bayes’ Theorem, particularly in his Proving History. It does not appear that he has understood what he has read. He gives perhaps the most complete bibliography in one place in this book, pp. 300-301. In it, he he refers the reader to a proof of Bayes’ Theorem from a basic probability textbook Probability, Random Variables, and Stochastic Processes (2nd Ed.). I alas have only the 3rd & 4th editions, but I can’t imagine Papoulis is so less clear about what Bayes’ Theorem is that Carrier’s edition fails to note the assumptions made. And even if it were this unclear, his other references are quite clear about the difference between Bayes’ Theorem and Bayesianism of the type Carrier uses.
His first reference is to Jaynes’ excellent text Probability Theory: The Logic of Science. Jaynes’ dedicates fair portion of chapter five in particular to the ways in which two people whose views differ initially, who both receive the same new information and who both use Bayesian reasoning will not necessarily agree. In fact, “whatever the new information D, it should tend to bring different people into closer agreement with each other…[a]lthough this can be verified in special cases, it is not true in general.” (p. 127). In the section of his bibliography containing sources Carrier identifies as for some reason “the most technical and advance”, we find Peter Lee’s Bayesian Statistics, who writes
“It should be clearly understood that there is nothing controversial about Bayes’ Theorem as such. It is frequently used by probabilists and statisticians, whether or not they are Bayesians. The distinctive feature of Bayesian statistics is the application of the theorem in a wider range of circumstances than is usual in classical statistics. In particular, Bayesian statisticians are always willing to talk of the probability of a hypothesis, both unconditionally (prior probability) and given some evidence (its posterior probability) whereas other statisticians will only talk of the probability of a hypothesis in restricted circumstances.”
Lee addresses the difference in the preliminaries section of his book, and instead of determining what the “wider range of circumstances” are that Bayesian statisticians apply Bayes’ rule to, Carrier points the reader to a proof of Bayes’ Theorem that cannot be extended to these circumstances. In another reference (Berger’s Statistical Decision Theory and Bayesian Analysis (2nd Ed.); p. 129) we not only find BT proper but are told explicitly how we must replace all the elements in the theorem’s formula with a variable and parameter in order to use it as in Bayesian statistics (the parameter part is absolutely fundamental to Bayesian methods, but is entirely lacking from Carrier’s work). Finally, for those who have read Carrier’s On the Historicity of Jesus and wish to compare his descriptions and so forth with what one last source he lists on p. 301 of his earlier book, namely Hartigan’s Bayes Theory, I’ve scanned and (attempted to) crop the books formulation and derivation of BT:
Subjective Probability & Historiography: Using Bayesian methods to prove that you believe your own opinions
I think, though, that Jaynes’ critique overrides all, as even if Carrier had the knowledge and information necessary, the best he could do was show that given his opinion, new evidence would change it in a particular way were he to apply Bayesian reasoning. They don’t call it subjective probability for nothing.