I might have to edit this or add another post once I am able to describe the specific study that inspired this post. However, were the problem mainly about that study, I wouldn’t be posting anything. In other words, the generality (or generalizability) is what counts here.
No, this isn’t the “real” version of The Scientific Method.
I’ve said before that there is no scientific method, at least not in the sense of some set of steps whereby a researcher or researchers have a hypothesis, design an experiment to test it, and if experiments support the hypothesis rather than falsify it, it transforms into theory (or a pumpkin; I’m always getting Disney fairy tale fantasies and the fantasy of The Scientific Method that teachers with no science background describe to groups of children in science class and which they will continue to think characterizes scientific research unless they end up doing it).
The “real scientific method” isn’t a “real” version of The Scientific Method. It’s a common practice across sciences that is surprisingly similar to scientific fraud yet is somehow at best frowned upon and at worst accepted as part of good research practice. Unlike “science experiments” that pre-college students perform in science class, actual studies take a long time. In the social, psychological/behavioral, cognitive, and related sciences frequently researchers conduct a number of pre-trials, survey studies designed to help determine experimental design, and use other preparatory methods that are in some sense a kind of experiment. Once they’ve worked out all the kinks of the main experiment or experiments they intend to use for their study, they obtain participants and run the experiment or experiments. However, often enough they don’t get the results that they wanted. For example, perhaps they hypothesize that belief in fatalism (we have no control over any aspect of our lives and everything we do is “fated”) is negatively correlated with the belief that criminals should be punished and positively correlated with the belief that the correctional system should be about rehabilitation. So the researchers do all there preparatory work, run their experiments, and don’t get the results they wanted.
A hypothetical example (that isn’t as hypothetical as you may think)
Naturally, you might suspect that they would determine that they were wrong. Often enough, this isn’t what happens. Instead, they change the experiment in different ways. To give a hypothetical example, let’s imagine that the researchers tested their hypothesis about fatalism by recruiting a 100 undergraduates to be participants, and had them answer a questionnaire that they’d developed (using pre-trials, cognitive interviews, etc.) and that is supposed to measure the degree to which someone is a fatalist. Then they have participants read about a (fictional) man named “David” who was found guilty of murdering his wife, teenage son, 9-year-old daughter, and their newborn infant. They are not told that this is a fictional case (we can even imagine that, to make it more “real”, the researchers included pictures of the now dead wife/mother, the infant, a courtroom that is supposed to be the trial, etc.). Finally, they are asked to rate on a scale of 1-7, with 1 being “strongly disagree” and 7 being “strongly agree”, the extent to which they believe the following statements are true: David is responsible for his behavior; he should be punished; he has a chemical imbalance in his brain; and finally he had any choice in the matter. They take all these data and correlate them with scores on the questionnaire, imagining that the more fatalistic a participant is, the more likely they are to say that David is not responsible, shouldn’t be punished, attribute his actions to his neurobiology, etc.
Instead, it turns out that everybody tends to respond that David is responsible, should be punished, and the other statements are simply unrelated to (i.e., very poorly correlated with) the score on the “fatalism” questionnaire. What do the researchers do? They change the scenario. They get more participants, use the same questionnaire, but this time they have each participant read a short account of a man named “David” who cheated on last year’s taxes to get a bigger refund even though he wasn’t entitled to it by law (usually, these sort of experiments are broken down such that e.g., for some participants the person is male and for other female, and similar methods are used to control for possible factors that may bias responses). Also, instead of pictures of victims, their story/account is loaded with words that are suggestively “causal.” For example, perhaps they include in the account that “David cheated on his taxes because the year before, he had accidently filled out his taxes incorrectly and ended up getting a much bigger tax refund. This year, he made the same errors so that the IRS would give him a bigger refund again”. They then use the same 1-7 scale, but ask about agreement with the following questions: David’s actions were immoral; David’s actions were determined by basic human desires; David’s behavior was a result of a large number of complex factors out of his control; and David should be put in prison as a punishment for his crimes.
This time, the researchers DO find predicted relationships between scores on the “fatalism” questionnaire and many of the questions. In particular, participants whose scores reflected fatalistic beliefs tended to respond that David’s actions were determined by basic human desires, his behavior was a result of a large number of complex factors out of his control, and that his actions were less immoral compared to “low fatalistic” participants’ ratings of the “David’s actions were immoral” statement.
“Those FRAUDS!” or “Wow those are good scientists!”?
Now that they have the correlations they hypothesized, they publish their research. The justification is that for some reason the first experiment was problematic, while the second one was not. What is their basis for such a conclusion? The fact that one experiment supported their hypothesis, while the other didn’t. Alternatively, they could come up with reasons why the first experiment failed (e.g., the horrific nature of the action evoked emotional responses that did not reflect the participants’ “real” views of the relationship between fatalistic views and the extent to which people should be held responsible for their actions, including being punished). However, if they performed the second experiment first and it failed to get the “right” results, the researchers could just as easily explain why THAT experiment failed.
Why hypothetical? Because nobody likes a tattle-tale, just the whistle-blowers
I’ve given a hypothetical example because usually the only way one is aware of specific incidences of this happening is through direct connections either to the study or to one or more of the researchers. In other words, it’s not something you can get from reading the study as published. The study that prompted this post was due to a course in moral psychology my sister is taking to get her master’s in nursing to become a nurse practitioner. The professor not only had the students read various studies on different issues, but in at least two cases they were able to talk directly with the lead author of one of the studies they read. The second of the two instances in which the class was able to talk to the head researcher, she was asked why she didn’t use a method much more obviously connected to the variable of interest, and her unabashed response was that they had but this had not given them the right results.
In the first graduate seminar I attended the professor, a researcher with some ~40 years of experience and director of a lab at one of the most prestigious institutes in the world, stated essentially that if a particular theory he believed to be wrong was finally supported via experiments that he couldn’t find any flaws with, then clearly the methods used in his field were problematic (why? Because “if you have really good reason to believe you should find” x, then “if you don’t find it”, your methods are flawed).
So, to often the “real” scientific method is keep changing the experiment until you get the results you want (and thankfully, the shocking lack of mathematical literacy and statistical expertise allow one to run various statistical tests until one of them outputs “results are statistically significant” and these are the only results reported).