Humans are amazingly smart. We not only have the most powerful brains in the known universe (which is a stupid statement because “known” here must implicitly refer to what we know and therefore the statement is basically “so far as we know, we are better able to know than anything we know”), we’ve managed to construct systems which are able to learn the way many living systems do, only with the advantage of unparalleled precision and “memory”. On the other hand, we’re morons who continually manage to make inferences we believe to be supported by logic, mathematical reasoning, rationality, etc., that are actually wholly illogical. What gives?

It turns out that logic and probability (which is required for valid inferences involving chances among many, many other every-day applications) do not wholly come naturally to anybody, and generally don’t come natural at all to most. Here I intend to give some “fun” (or hopefully at least mildly interesting) illustrations as to why we can be some dumb when we think we are being so rational.

The first method I’ll use is to demonstrate the counterintuitive nature of probability and logic by some classic examples:

1) The Full Monty (Hall). Imagine that I run some booth at a carnival which consists of a table and 3 boxes. You are intrigued, not because my booth looks like it promises anything remotely resembling entertainment, but because you wonder how I could possibly make a dime with such an unappealing display. So you ask me to explain what the “game” is and what you can win. I tell you that within one of the three boxes is 100 grand (I don’t tell you I mean the candy bar). The other two are empty. But this isn’t simply a game of guessing which box holds the prize; I sweeten the pot: you pick a box, and without telling you what’s in it, I open one of the other two boxes (it will never be the box with the prize). Now I give you the chance to change your choice.

You, being a rational person with no small amount of brain power, see right through this “sweetened pot”. The chances that you chose correctly initially were 1 in 3, because there were three boxes and one prize. Now, I’ve shown that one of the boxes doesn’t have the prize, and your chances are better (1 in 2), but I’ve tried to trick you into thinking there is any good reason to change your mind. After all, there are now two boxes, you’ve chosen one already, the chances are 1 in 2, so what would be the point of switching choices?

A lot. Consider a similar situation, only this time there are a billion boxes. The chances that you select the correct one with your initial choice are 1 in a billion. This time, after you make your initial choice, I again reduce your options to 2 boxes by opening all the boxes that don’t have prizes other than the one you picked and one other box. You know the prize must be in one of the remaining boxes, but how on earth did you manage to choose a box such that your chances went from 1 in a billion to 1 in 2 (apparently) and there’s no point in switching your initial choice? The answer is that you almost certainly chose the wrong box. The probability was 1 in a billion, but even though there are only 2 boxes left the chances aren’t 1 in 2. As soon as you select a box, whether there are initially 3 or initially a billion, in order to reduce your options to 2 boxes I have to open up a non-prize box or non-prize boxes. Either you initially selected the prize box, in which case I can open any other box or boxes I wish so long as I leave 1 unopened, or you didn’t, in which case I must open boxes (or a box in the case of 3) until only the prize box and the box you selected are left. Even in the case of 3 boxes (and obviously in the case of a billion), the chances are you chose wrong. Even in the case of 3 boxes, if I open one more I have to do so on the condition that it isn’t the prize box, which means the probability is conditional (it is contingent upon your initial choice). So even in the case of 3 boxes, you should switch your choice.

2) Smoking doesn’t cause cancer. Imagine that I am the little guy fighting the big, bad Tobacco Industry in some Hollywood-like dramatization (I’m working with a guy who looks a lot Al Pacino, and I bear a striking resemblance to Russell Crowe). I end up in court, but with the results of years of research. I can show that the incidence of cancer among the smoking population is a billion times higher than average. My research designs are flawless, my use of statistics perfect, and the finding (the difference between the cancer incidence among smokers vs. non-smokers) is accurate. I also lucked out because the jury happens to consist of several eminent logicians, a few professional statisticians, and two mathematicians. My lawyer and I agree the jury will inevitably rule in my favor. Only they don’t. Why?

Mostly because they are using rules of logic and analytic reasoning. In this example, sticking to formal reasoning yields the wrong answer, because the jury consists of individuals who know that correlation doesn’t equal causation and have forgotten that the reason we frequently think it does is sound: correlation makes causation more likely, and the greater the correlation the greater the chances (in general).Also, the statisticians unfortunately all work in the social and behavioral sciences, and are prone to consider my results in terms of the kinds of arguments marshalled to show that marijuana is a gateway drug: people addicted to heroin, cocaine, etc., are much more likely than the average person to have smoked marijuana and, in addition, they almost always start drug use with marijuana. Ergo, gateway drug. Only, 1) almost all alcoholics started drinking milk, but milk isn’t a gateway drink and 2) the “gateway drug” fallacy involves comparing the wrong populations. These problems are related and the “gateway drug/drink” story is a classic example of the “correlation equals causation” fallacy. If I run several studies on samples from the population of alcoholics, I will find a very high correlation between drinking milk and being an alcoholic. That’s because I started with alcoholics, so anything that holds true in general of people is going to be highly correlated with alcoholism. If I compare the incidence of milk-drinking in the population in general to the population of alcoholics, I’ll find that I get the same “correlation” with no-alcoholics. This isn’t quite true of marijuana vs. heroine, crack, etc. Here, the population in general has a much lower average rate of marijuana usage and amount of marijuana usage than the population of addicts. But I’m still ignoring a key population: people who use or have used marijuana. When I *start *with this population, and try to use marijuana to *predict* addiction to crack, heroine, etc., I will fail. In fact, my predictive power won’t be much better than if I used milk for my predictive model.

But we all know that smoking *does *cause cancer, right? Not exactly. In most uses of the word “cause”, the answer is yes. But scientifically, we tend to restrict claims that x causes y to what is called “necessary and sufficient conditions”. That is, to claim x causes y, we require that for y to happen, x must happen, and that y cannot be happen unless x does. These might seem to be equivalent statements (logic again!), but they aren’t. Consider billionaires. In order to be a billionaire, you have to be a millionaire because you have to have millions to have billions. Thus being a millionaire is a *necessary *condition for being a billionaire. But not all millionaires are billionaires. So being a millionaire is not a *sufficient *condition for being a billionaire. The reason the very rational, highly educated jury came to the wrong conclusion is because they correctly realized that my evidence didn’t demonstrate that smoking causes cancer, but they failed to correctly realize that my evidence made the probability that smoking causes cancer incredibly likely. Unlike the “gateway drug/drink” fallacy, I (being the diligent, Russell Crowe-looking researcher I am) *didn’t *test only smokers or only those with cancer but compared the incidence of cancer among BOTH the population of smokers and non-smokers. Now, it could be that my results are due to a third variable such as some gene that tends to both cause cancer and make people inclined to smoke. But if so, the Tobacco Industry lawyers would have such an explanation, and the fact that, with all their money and resources, is actually evidence that the jury should have considered (absence of evidence IS evidence of absence, it just is often not very good evidence).

3) One-eyed flying purple people-eaters. This example is quick, neat, and dirty. Consider the statement “all numbers are even.” This is obviously wrong, but why? Well, because all it takes to show it’s wrong is a single counter-example like 3. This sounds reasonable- to show that a property X doesn’t hold for all Y, one need only find a single Y for which X doesn’t hold. But consider the statement “all one-eyed flying people-eaters are purple.” I argue (as logicians, mathematicians, etc., do) that this is true. Moreover, that it is clearly and obviously true, and that nobody familiar with logic or analysis could possibly think otherwise (I’m wrong about this, but not for the obvious reasons). You, being a rational, intelligent individual, assert that I’m clearly and obviously wrong (and a moron to boot). Well, with the statement “all numbers are even”, we proved this false merely by offering the counter-example 3 (there are infinitely many other counter-examples we could have offered, but we only need one). To show that the statement “all one-eyed flying people-eaters are purple” is false, there must exist at least a single one-eyed flying people-eater who *isn’t *purple. When you can show me this person, then you can tell me I’m wrong.

4) To prove it, assume it’s true. When taking courses in logic, set theory, argumentation, or other topics which involve formal, logical proofs, students often find the proofs for one kind of statement illogical/irrational. Consider the statement “If x, then y”. How might you prove this in e.g., a mathematical/formal logic course? Let’s make this less abstract. We’ll start with the example “if x is a swan, then x is white.” It seems pretty intuitive that for anything x, for this statement to be true it must be true that there exist no black swans. But what about a statement like “if I’m the king of the universe, then the moon is made of green cheese”? This statement is in fact true. We’re back to the one-eyed flying people-eater problem. That’s because the truth of any statement “if x, then y” (whether involving swans or me ruling the universe) doesn’t depend (at least directly) on reality. To see why, imagine how you’d determine whether or not I was right if I told you “if you go outside, you’ll get wet from the rain.” The easiest way would be to go outside, because if you went outside and didn’t get wet, you could say I was wrong. If you didn’t go outside, though, you can’t possibly say that what I said was logically false because I only stated that *on the condition *that you went outside, *then *you would get wet from the rain. Remember when I mentioned conditional probability in the first example? Well, this is the very related issue of “conditionals”. When I say something like “if x, then y”, what I have logically claimed is that “on the condition that x is true, then y must be true.” If x is false, then necessarily the conditional is true. The reason why “if I’m the king of the universe, then the moon is made of green cheese” is true is because I’m not the king of the universe (yet…). Conditionals make a conditional assertion (the “if x” part), and to be wrong, the condition has to be met and the “then y” part turn out false. By making the “if x” part false, I make it impossible for the condition to be met and therefore impossible for the conditional statement to be false. Similarly, when I say “all one-eyed people-eaters are purple”, I am saying that a property holds true of every member of a group that doesn’t exist. You can’t prove it false because it is vacuously true.

5) Vacuously true and the null/empty set. There’s a famous paradox called “The Barber’s Paradox.” It’s sort of interesting: there’s a town in which a barber shaves all and only the beards of men who don’t shave themselves. Who shaves the barber? He can’t shave himself, because he only shaves those who don’t shave themselves. But if he doesn’t shave himself, then he must shave himself because he shaves every man who doesn’t shave himself.

Turns out this kinda stupid kinda interesting example is more important than you might think. The man who founded formal/symbolic logic, Gottlob Frege, did so in order to achieve something even more incredible in mathematics (what I won’t go into). The problem is that he allowed logical statements of the form “X is the set that contains only and all sets that don’t contain themselves.” Although we substitute barbers and beards for a symbol X and sets, the logic is the same (it’s called Russell’s paradox, because Bertrand Russell wrote a letter to Frege in which he demonstrated that Frege’s formal system made possible statements which couldn’t be formally evaluated; unfortunately for Russell, an equally simplistic paradox was to unravel his and White’s monumental 3-volume work *The Principia Mathematica*). As a result, logicians and mathematicians realized they had to be very careful about defining sets. In particular, there must exist one and only one “empty” or “null” set (the set with no members/elements) and this set must be a subset of every set. The fact that it is a subset of every set is actually intuitive. Consider the set of all people and the subset of people who are purple, have one-eye, and fly. How many members belong to this subset? None. The more important issue is that there is only *one *“empty” set. Imagine this isn’t true and consider again the set of people and the subset of people who are purple, have one-eye, and fly, but this time also consider the set of people who are dragons. If these sets are different, then one of them most have some element/member that the other doesn’t. But neither have any members, making this impossible.

Conclusion example) Why the most logical intelligent systems are worse at recognizing faces than babies. I hope that these examples and the discussion of them has revealed something about the counter-intuitive nature of logic and probability (it’s a bit difficult to demonstrate that something is hard to understand while making it understandable, so if either the examples were too difficult to be understood or too easy to be counter-intuitive, you’ll just have to trust me here). So why do we do things like illogically infer that correlation is causation and find illogical perfectly logical arguments or statements? Because we live in the real world. Consider facial recognition software. Chances are, you’ve never actually written a facial recognition program or any program that allows a computer to “learn” to recognize/classify things (whether distinguishing letters from scribbles or passing CAPTCHAs). But you have probably used a calculator. Computers are called computers because that’s what they do: compute. Computing is another word for calculating, and your simple calculator is really the same as the world’s best supercomputer (just slower). Getting a computer to recognize faces or letters is not that different from trying to get your calculator to do this. Computers understand nothing. To get them to do anything, you have to reduce it to mere logical rules (because every computer program ultimately works only because of “logic gates”, which are physical realizations of a few simple, logical “operations” from formal logic). To understand what I mean, we will finally see the *real *version of logical statements. In formal logic, a statement like “there is only one Pope” must be rendered into symbolic form such as ∃x[(Px → Hx) & ∀y(Py → y=x)] (in words,” there exists an x such that if x is the Pope then x is human and for all y, if y is the Pope then y is x”). The point is to be able to take something inherently meaningful, like language, and reduce it to meaningless symbols such that one can decide whether or not it is true or not without having any idea what it means. In fact, in logic courses one learns to do “proofs” or “derivations” in which one uses rules of logic to show that e.g., ~(A v B) ≡ (~A & ~B) without needing to know what “A” or “B” mean. That’s because students learn that the certain symbols are “operators” that require a particular, specific operation and others are things that must be operated. Another word for “operated” is “computed” or “calculated”, which is why, in order to get computers to recognize faces, we have to reduce faces and the differences between them to pure, meaningless formal logic.

Humans don’t think logically or rationally much of the time because we recognize it is far more important to be able to recognize causes and patterns that enable us to recognize the relationship between lots of smokers with cancer and smoking as a cause of cancer, recognize trees as being trees despite the fact that no two trees are exactly the same the way computers would have it, etc. Kurt Gödel was a brilliant mathematician and probably the greatest logician who ever lived. Towards the end of his life, he came to believe that people were trying to poison him, and he trusted only his wife to prepare his meals. Unfortunately, she grew ill at one point and had to spend an extended amount of time in a hospital.

Gödel, however, was nothing if not logical. He had two premises:

P1) People are trying kill me by putting poison in any food I eat.

P2) The only food that I can believe isn’t poisoned is food provided by my wife.

From these premises follow this conditional proposition:

C) If my food isn’t provided by my wife, I can’t believe that it isn’t poisoned.

From this logical inference he concluded (validly) that the best course of action would be not to eat anything until his wife returned, as either she would return before he starved to death, or he would die of starvation which meant the same outcome as eating food not prepared by his wife (death). The problem is that this highly logical, valid inference was ridiculous because it ignored the wildly improbable idea that there was some mass conspiracy to poison him. Turns out, common sense may not be logical, but it is far more useful most of the time. It turns out that our intuitions about the probability of tossing a fair coin 100 times and getting all heads isn’t the same as getting a sequence of heads and tails (though this is false) is useful because it allows us to look past the specific sequence and recognize the more important truth: it’s more probable that tossing a fair coin 100 times will result in *some *sequence of heads and tails than all heads. It turns out that because we don’t analyze language according to logical rules, we don’t interpret “If you’re hungry, there’s food on the table” to be equivalent to “if there’s not food on the table, you’re not hungry”. Turns out being a little illogical goes a long way to being right more often than not (particularly in the tens of thousands of years humans existed before things like number systems, formal logic, or writing even existed).

If someone were to program a robotic arm to flip a coin, would you believe fair coin flips are possible? That is, without contradicting what we think of as fair coin flips? Which would be the difference between a fair coin flip and an accurate or precise coin flip.

This is actually anything other than a trivial question and is deeply philosophical, addressing the nature of probability as it has been debated for centuries. For example, a central interpretation of probability yields an interpretation in which we determine fair coins based upon our 1) subjective (probabilistic) assumption that a coin flip will yield an approximately equally likely result of heads vs. tails and 2) that a fair coin is defined by the limiting frequency of toss result as being distributed 50/50.

Yeah, it brings up the question fairness … “How fair is a fair coin-flip?”

And… “Is it possible to get a fair toss with unbiased results?”

Sounds like a DYI science experiment! Because there are so many different ways you could go about setting it up and recording the results. As in, should the coin start at 45* angle, 67.5* angel,.. the amount of force it will be hit with every time, which side is facing up, will it be an orthodox flip or an unorthodox flip, the mechanism doing the flipping, etc.

In the frequentist interpretation of probability, the reason a coin toss is 50/50 is because of the simple calculation of the probability for heads (or tails) as “the limit of ‘n’ of 1/2 to the n as n approaches infinity”. In other words, we are dealing with pretty elementary calculus and can show that a fair coin has probability 50/50 because, given infinite tosses, we will get exactly half heads (or tails). The problem is neither we nor a robot can even simulate infinite tosses (and that’s without factoring in air resistance and the like). In other interpretations of probability, such as subjective interpretations or Bayesian interpretations (the two are not exactly the same although they overlap), we can start with the assumption that the coin is fair or adopt a prior probability distribution in which infinite coin tosses yield heads with probability 1/2 or a (symmetric, approximately normal) Bernoulli distribution. Then we start flipping the coin. If, once we’ve flipped the coin a million times, these kinds of probability distributions not only tell us that we are very likely to have a biased coin but can tell us the probability that the coin is biased, given both the assumption that it is fair AND the actual results (there are more nuanced and complicated methods, but the idea is that we don’t say a coin has a 50/50 probability, we assume it and retain this assumption as long as there exists a “rational” reason to, which is to say that we in some way “bound” how willing we are to accept that the coin is fair by deviations from the 50/50 average split of heads/tails in some sequence of tosses).

Fair coins are, of course, idealizations as they require an idealized toss, but even were we to create a specialized room in which we controlled for atmospheric composition and other such variables and used (per your suggestion) a robot rather than a human to flip the coin so as to control for mechanical variations, (frequentist) probability itself tells us there is a very high probability that we will not get a 50/50 heads/tails split and that it is quite likely we won’t get a that nice, approximately normal distribution in that the distribution will be skewed.

Luckily, however, we CAN simulate rather easily the distribution of 1,000 coin tosses or 10,000 coin tosses to see how often such simulations are close enough to a 50/50 split that the difference is too minute to bother with. It’s a bit like calculating the de Broglie matter-wavelength for e.g., a baseball. Given its momentum and velocity, we can calculate this, but the difference between it and what we would get by assuming classical mechanics is too small for us to measure.