A majority of research papers mention statistical significance. The title of this post suggests that I am going to address what that is. I’m not (mainly because the answer is almost inevitably “nothing important”, but also because I am an ass who enjoys suggesting I’m going to inform others about X when really I’m going to speak about Y).
Statistical significance, as reported in the social, psychological, cognitive, behavioral, medical, and other sciences is mostly about p values. Put simply, the idea is that if you have a really, really low p value (like .05, 0.1, or 0.01) then your results were “statistically significant.”
One thing that isn’t often communicated in popular science is the statistical methods that determine whether some result/finding is statistically significant. In fact, one might get the impression that there is a single statistical/mathematical approach that enables one to determine whether a finding is, in fact, statistically significant.
There isn’t. Moreover, in addition to a slew of assumptions underlying the use of every single statistical test for significance (which is usually violated in practice), almost every “statistically significant” finding is “significant” based upon assumptions about how a “variable” (stomach pain, political views, self-esteem, etc.) is not only able to be captured with infinite accuracy but is identical to “variables” of physical systems such as the conservation of a system as captured by the addition or subtraction of energy via photons and the mass-equivalence principle, or more simply some system of classical particles obeying Brownian motion.
In short, we pretend that e.g., responses to questions that are of the form “strongly agree” or “very often” are like physical measurement instead of attempts to gain information on human constructs. While physicists, engineers, chemists, etc., learn statistical measures/tests in order to account for measurement error of a “variable” (e.g., heat, mass, conductivity, etc.), most scientists assume that their measurements are basically perfect even though they don’t really know what “variables” they are measuring (if you don’t believe me, google “latent variable”).
Consider the prototypical significance testing example: a clinical trial with a placebo pill and the “real” pill. Let’s say the pill is an anti-depressant. I have my control group of people diagnosed with depression (the placebo group) and my treatment group (these are also called control & treatment “conditions”). I need to determine whether or not the differences between the treatment group and the control group are “statistically significantly” different in terms of the efficacy of the sugar pills vs. the anti-depressants. Which means I need to somehow quantify how much less depressed one group became over some period of time. How do I do this? I pretend that participant responses on questionnaires measure each individual’s level of “depression”, pretend that each individual is depressed in exactly the same way (remember, I’m measuring differences among participants along a single variable, just as if I were measuring differences in weight), and then pretend that I actually measured the variable “depression” as it exists in my population of interest (those people whom I will clam will benefit from the pill), despite having not done any of this.