Inside the Dispute Over a High-Profile Psychedelic Study

On April 11, Nature Medicine published a paper about what happened in the brains of people with depression who were treated with either an antidepressant called escitalopram or psilocybin, the active ingredient in magic mushrooms. The findings, the paper’s authors wrote, “suggest an antidepressant mechanism for psilocybin therapy,” or a potential way that psilocybin might reduce the symptoms of depression.

This is not the first study to offer promising evidence that psilocybin might be an effective treatment for depression. Exactly how it does so is still elusive, though there are theories. The authors of this paper—several of whom are well-known names in psychedelic research, like UCSF’s Robin Carhart-Harris, formerly of Imperial College London, and neuropsychopharmacologist David Nutt—offered a suggestion, writing that psilocybin might lead to a decrease in “brain network modularity,” or an increase in different parts of the brain functionally connecting with each other.

Videos by VICE

Given the prominence of the journal, and the caliber of the authors, the study immediately garnered attention in the field, as well as media coverage. The BBC wrote that “Psilocybin, a drug found in magic mushrooms, appears to free up the brains of people with severe depression in a way that other antidepressants do not.” Articles reported that psilocybin “may help rewire the brain,” language that emulated the study’s press release.

Paraphrasing Nutt, the BBC wrote, “the brain can get stuck in a rut and locked into a particular negative way of thinking,” and that after taking psilocybin, “people’s brains opened up and became ‘more flexible and fluid.’” Nutt tweeted, “Really pleased with this proof that psychedelics work differently from SSRIs.”

On social media, though, other scientists began responding to the study and asking questions: about the switching of depression-measurement scales, how statistically sound it was to compare the antidepressant and psilocybin groups, and whether the data provided enough heft for anyone to really say that there was “proof” as to how psilocybin works differently compared to an SSRI.

Three scientists—Johns Hopkins’ Manoj Doss and Fred Barrett, and Yale’s Phil Corlett—were concerned enough about the study that they submitted a letter to the editor to Nature Medicine. It was rejected. When reached for comment, Nature Medicine responded that it couldn’t discuss the specifics of the decision for confidentiality reasons, but that the journal thoroughly reviews all concerns.

Doss, Barrett, and Corlett then uploaded their comments—which from here on out will be referred to as “the Doss comment”—to PsyArXiv, a preprint server for papers that haven’t been peer reviewed, outlining what they wrote were “inconsistencies, statistical flaws, alternative interpretations, [and] conspicuous omissions.”

Last week, Carhart-Harris, Nutt, and Richard Daws, first author of the Nature Medicine paper, posted a reply to the Doss comment. It emphatically rebutted each point; questioned the motives of Doss, Barrett, and Corlett; and referenced Carhart-Harris and Nutt’s long publishing history in the field. “Pointing an accusatory finger at scientists who have done much to advance the scientific credibility of psychedelic research, is unfair, to say the least,” they wrote.

This led to yet another surge of responses from those watching this exchange between some of psychedelics’ most visible researchers play out in public. Eiko Fried, an associate professor in clinical psychology at Leiden University, tweeted, “Although the authors admit to several issues in the rebuttal (i.e., sufficient reasons to invite a critical commentary), they question the motivation for the commentary & propose it is some sort of personal revenge, rather than scientific motivation. This is unprrofessional [sic].”

Stuart Ritchie, a lecturer at King’s College London, tweeted, “Hilarious response by psychedelic researchers to scientific criticism of their work. ‘Don’t you know who I AM?!’” Doss tweeted an expanding brain meme that quoted directly from Carhart-Harris, Nutt, and Daws’ reply.

Keep the DMs coming y'all! pic.twitter.com/ojDPQZj9Wi
— Manoj Doss not exist (@ManojDoss) May 11, 2022

Others more quietly reevaluated things: When Motherboard reached out to one of the paper’s reviewers, he shared that when he originally reviewed the paper he had not noticed the “one-tailed test,” a statistical hypothesis test used in the paper that the Doss comment took issue with. “I did not pick up on the fact that there were one-tailed tests in there in reviewing the paper, or I would have asked them not to do that,” said Jared van Snellenberg, an assistant professor of psychiatry at the Renaissance School of Medicine at Stony Brook University.

This whole situation is in step with a recent shift in mood in the so-called psychedelic renaissance: After decades of stigma and prohibition, there had been, as of late, incredible enthusiasm. When landmark studies came out on the use of psychedelics for end-of-life anxiety, treatment-resistant depression, smoking, and more, they understandably garnered immense excitement and attention from the media and public, as well as companies and investors. While these drugs are still federally scheduled as substances with “no medical value,” their reputations have increased greatly, and there is a broad consensus that they are worthy of research for a number of indications.

As psychedelics have gained more respectability, though, the pendulum has been swinging the other way again: not back toward stigma or calls for prohibition, but away from uncritical hype and toward a desire for a more circumspect approach to the research, its processes, and the claims that can be made about it. Journalists from Psymposia, New York, STAT, the CBC, and more have called attention to MAPS’ MDMA clinical trial designs and safety concerns, and how corporatization will affect the rollout of psychedelic medicine. Researchers, meanwhile, are publishing on some important limitations of psychedelic study design.

To be clear, this is a good thing. No longer automatically celebrating any published research about psychedelics, but instead scrutinizing that research to ensure it meets the stringent standards applied to more quotidian drug research, could be interpreted as a mark of a maturing field. This is also a pivotal moment to reconsider how psychedelics are talked about and communicated to the public, and whether it’s a good thing for media articles, press releases, and informal interviews to describe psychedelics as “liberating the mind” or “rewiring the brain”—or if that kind of language does more harm than good.

Young girl with a bob haircut turned her side to the camera.

There are urgent reasons to be cautious about declaring that definite biological mechanisms, especially in comparison to other medications, have been found already. This issue plagued, and continues to haunt, antidepressants and other psychiatric medications. They were once lauded as definitive cures for complex conditions like depression, and communicated about through over-simplified biological mechanisms, like the chemical imbalance myth. Too much focus on tentative positive findings can obscure safety concerns and negative side effects. And in the growing for-profit psychedelic market, we’ve already seen results from small studies being used to make claims in advertising for psychedelics products and services, as Russell Hausfeld from Psymposia has reported.

This Nature Medicine paper was released into this changing ecosystem, and the back-and-forth observed over the past couple of weeks was the result. But what might have evolved into an instructive discussion of how to interpret complex brain data was derailed by discussion about the original authors’ tone in their reply.

The Nature Medicine paper presented intriguing results and suggested avenues for future research; scientists should be re-analyzing data from previous studies and testing their hypotheses about brain mechanisms, and we should look forward to more of this research in the future. But independent statisticians that Motherboard consulted with said they agreed there were valid reasons to be cautious, and to not over-interpret the data and make claims from this study alone: The lack of a big enough difference over time in brain modularity between the groups makes it hard to assert that psilocybin has unique effects compared to the antidepressant, and we should be cognizant that our understanding of depression symptoms depends on the scales and tools we use to measure them, they said.

Comparing between groups and medications, evaluating the efficacy of depression scales, and the other concerns brought up about this paper are all aspects that almost certainly will come up in future psychedelic research. If these details are hashed out now, and others are made aware of them, the open deliberation of a paper like this one could be used to make research even stronger—if those in the psychedelic community are willing to communicate and grow together.

One of the quirks of the Nature Medicine paper is that while it contains new brain-imaging analysis, the people in the study had already been the subjects of previous published work: an open label trial from 2017, and a double blind randomized control trial (DB-RCT) from 2021 that was published in the New England Journal of Medicine. The Nature Medicine paper includes brain imaging from 16 people who took psilocybin in the open label trial, as well as 22 people who did psilocybin-assisted therapy, and 21 in an antidepressant group from the DB-RCT. (This group took a low dose of psilocybin too—1 mg, compared to 25 mg—but it was presumed to be inactive.)

The new study looked specifically for something called brain modularity. Brain activity can tend to group into specific networks, or patterns of connection, called modularity. The paper found that brain modularity decreased in people who did psilocybin-assisted therapy in both studies, meaning that their brains showed increased connectivity between regions that typically don’t connect. This decrease in modularity was also shown to be correlated with lower scores on the Beck Depression Inventory (BDI), a scale that measures depression.

Intriguingly, when the authors of the Nature paper looked at the group of people who took the antidepressant, escitalopram, they didn’t find a significant decrease in modularity, and there was no correlation between modularity and BDI scores.

Because of this, the authors suggested in the paper that psilocybin might have antidepressant effects because of this “liberating effect,” freeing up the brain from its usual modularity. “This ‘liberating’ action of psilocybin is paralleled by subjective reports of ‘emotional release’ as well as subacute increases in behavioral optimism, cognitive flexibility and psychological flexibility after taking a psychedelic drug,” they wrote.

So what’s the problem? The Doss comment proposed several issues with how the study was conducted, and the conclusions that were drawn from it. In their request for comment, a spokesperson for Nature Medicine, said:

“Nature Medicine welcomes scientific discussion and debate regarding papers published in the journal and recognizes the importance of post-publication commentary as necessary to advancing scientific discourse. For confidentiality reasons, we are unable to discuss the specifics of the editorial history or review process of submissions that have been made to the journal. However, we would like to make clear that whenever concerns are raised about any paper we have published, we look into them carefully, following an established process, consulting the authors and, where appropriate, seeking advice from peer reviewers and other external experts. Once such processes have concluded and we have the necessary information to make an informed decision, we will follow up with the response that is most appropriate (where necessary) and that provides clarity for our readers as to the outcome.”

You can read the full Doss comment here, but there are a few main points worth summarizing because these are matters that will arise in future research on psychedelics, and it’s helpful for the public to fully understand the complaints. This will focus on the parts of the Nature paper that looked at the people in the trial that compared the antidepressant to psilocybin, since making claims about psilocybin’s mechanism of action compared to existing treatments is what the authors of the Doss comment found to be most troubling.

In Carhart-Harris, Nutt, and Daws’ reply, from here on called the “Carhart-Harris reply,” the authors respond to each of the below points, and more. Carhart-Harris also responded to questions from Motherboard before posting their reply online, which we are linking to here in full for transparency. It overlaps with, but is not identical to, the Carhart-Harris reply published on PsyArXiv.

An interaction between groups

Of all the points raised in the Doss comment, the lack of interaction between groups was the one that Kevin McConway, emeritus professor of applied statistics at The Open University, and Nicole Lazar, a professor of statistics at Penn State—two outside statisticians I consulted with—said were of the most concern.

This is also one of the trickier points raised, since the word “interaction” means something different in statistics jargon than in everyday language. “People get confused about this because it’s called ‘interaction’ and it’s one of these confusing words in statistics where it sounds as if it means one thing and it doesn’t,” McConway said.

This plunges us straight into the heart of how hard it is to do and interpret science, balance multiple variables, understand how they affect one another, and interpret what can be explained by chance or actual interactions. An interaction, basically, is when you have an outcome that depends on something else, McConway said. To show that something has a significant interaction when comparing two groups, there has to be a big enough difference between the difference in effects in those different groups.

This quickly can become statistics word salad, so here’s an analogy (that McConway vetted for accuracy): Let’s say you want to test how well people obey traffic laws, based on whether they do classroom-only driving school or take road-driving lessons. To say that one of these interventions really has a significant effect on how often people run red lights compared to the other, you’d have to show there’s a difference between the effects of the two interventions on this outcome over time that’s big enough so that it can’t plausibly be explained just by chance variation.

Woman stand in the dark with universe reflected on her body.

Start with a group of people with different levels of driving skills, and put some in classroom-only driving school and others in road-driving lessons. Over six weeks, the classroom-only driving school group goes from an average of running 10 red lights to three. The road-driving group goes from running nine red lights to four. The classroom-only driving school group may have had a statistically significant decrease in running red lights (from 10 to three) and the road-driving lessons did not (from nine to four), but the difference in change between the two groups isn’t different enough to be considered significant. The first group’s red-light running decreased by seven and the second’s by five, meaning there’s just two red lights of difference between them between when they started class and six weeks later.

At the end of this study, you can’t say, based on the data, that the syllabus of the classroom-only driving school is making people obey traffic laws more than road lessons, because there wasn’t a significant interaction effect of one variable (types of school) with another variable (time) on the outcome: running red lights.

The authors of the Doss comment wrote, and McConway agreed, that in the Nature Medicine paper, there wasn’t enough of a difference in the change in modularity between the psilocybin group and the antidepressant group over time to say that the psilocybin had a unique effect on brain functioning, let alone how that mechanism might affect depression.

The authors of the paper were upfront about a lack of interaction in the text. They wrote, “There was no significant interaction between treatment arm and scanning session on network modularity,” but they further wrote that “there was evidence that the reduction in network modularity and its relationship to depression severity may be specific to the psilocybin arm,” based on the correlation between modularity and depression scores that they did later.

Corlett and Doss told Motherboard this was wrong. To say that psilocybin has an effect on brain modularity different from escitalopram, there would need to be a significant interaction between the two groups that showed that the change in modularity before and after treatment was different enough for the different drugs.

“If you don’t see a significant difference between the groups, you can’t say that the groups are significantly different,” Corlett said. “I realize that sounds simplistic, but if the change in modularity, the change in your brain measure, isn’t differently different across two time points, then you can’t claim that the mechanism is specific to that psilocybin.”

On its own, the psilocybin group did show a significant decrease in modularity from their first scans to the second, and on its own the escitalopram group did not show a significant difference in modularity from the first scan to the second. But the difference in modularity change between the two groups over time was not significant.

Jacob Aday, an experimental psychologist at University of California San Francisco who has published on psychedelic study design, said that he too found the interaction criticism to be the most notable of all of the ones raised in the Doss comment. “There is a lot of compelling evidence presented in the paper suggesting that modularity is related to the antidepressant effects of psilocybin,” he said. “But we just can’t technically conclude that it is specific to psilocybin because of the interaction.”

In the Carhart-Harris reply, and in the response to Motherboard, the authors said that the lack of interaction effect the Doss comment referred to was not originally tested for in their study, since they were focusing on testing their hypothesis on how modularity changes after psilocybin, not necessarily the interaction between the groups.

The test for a significant interaction was asked for by one of the paper’s reviewers. “Had we found significance on the interaction, this would have enabled us to draw stronger inferences on the findings than we did in the paper,” they wrote. “The interaction was directionally consistent with our true prior hypothesis on modularity change post psilocybin. We held no such hypothesis regarding modularity change post escitalopram.”

They wrote that Corlett’s comments were correct that no interaction means the claims about psilocybin versus SSRIs should be tempered. “However, we saw no change in modularity with escitalopram nor a relationship with symptom change in that arm,” they wrote. “Based on this and broader knowledge of the action of SSRIs versus psilocybin therapy, we believe it is reasonable to suggest a different therapeutic action with psilocybin than SSRIs, something we have written on extensively in the past, backed by a wealth of supportive evidence.”

As they wrote that in the paper, they don’t claim that there is a difference, but only “suggest it may exist.”

A one-tailed test

When the new paper looked for a correlation between brain modularity and depression scores, it used a “one-tailed test.” As opposed to a two-tailed test, a one-tailed test is when a researcher only tests one hypothesized outcome, but not its opposite. For example, if I was testing the correlation between drinking alcohol and having a hangover, if I did a one-tailed test, I would only look for the correlation between an increase in alcohol and an increase in hangover and forgo the opposite: an increase in alcohol and a decrease in hangover.

In this case, the study only tested if there was a positive correlation between more decreases in modularity and decreases in depression, but didn’t test for a negative correlation, or a decrease in modularity and an increase in depression. One-tailed tests are usually only done if you have strong prior evidence that a relationship will be positive or negative, but it’s extremely rare, said Lazar.

“You really need to have it specified ahead of time and you really need to be sure that the effect in the other direction wouldn’t be interesting to you,” she said. “In all of my career, there’s really only one situation where I’ve come across that it made sense to only be thinking about a one-tailed test.”

One-tailed tests lower the statistical threshold for a conclusion, because the chance that something may occur outside of chance (or the p value, which is typically aimed to be less than .05) is divided into two. In the new study, the one-tailed test was significant for their correlation, with a p value of .025. But if they had done a two-tailed test, it would have gone up to .05 which is on the threshold of significance.

“Is that why they used a one-tailed test?” McConway said. “The trouble is, I suspect it isn’t. But it makes it look suspicious and you have to be clear and clean on these things. If they are going to do good research on this, they need to be more careful with this. That’s the way I’d put it.”

Corlett said that in his opinion, looking at the data, seeing what direction it’s going in, and then using a one-tailed test is an example of “HARKing,” or “hypothesizing after the results are known,” when you make a hypothesis after seeing some results. “As we outline in the letter, the data already published would not warrant the one-tailed test, they are ambiguous at best,” he said.

Doss said that a 2021 paper he co-authored, which wasn’t cited in the Nature Medicine paper, might have given clues that a two-tailed test was needed. It similarly examined the effects of psilocybin on cognitive and neural flexibility, and found some evidence of a more complicated scenario: More neural flexibility could sometimes mean less changes in cognitive flexibility.

“What if psilocybin therapy pushed some participants’ modularity over the edge of greatest therapeutic efficacy?” Doss said. “That is, their brains became such a mess that they actually got less improvement in their depression…A one-tailed test doesn’t examine that possibility.”

Beaker filled with glowing pink liquid, surrounded by blue light.

It could be possible for the brain to become too dynamic, Doss said. “You might see this inverse relationship where too much dynamics in the brain might result in less improvement on this task of cognitive flexibility.” It’s another reason why Doss thinks a one-tailed test was unwarranted; he said we don’t know enough yet about what increased connectivity of the brain means to only look at one side of a correlation.

In the Carhart-Harris reply, the authors wrote that “Doss et al. fail [to] appreciate the logical flow of our analyses and the compelling evidence-based logic justifying our decision,” to do the one-tailed test. They did this “as the direction is so strongly implied,” they wrote, based on the observation that decreases in brain modularity was positively correlated with decreased depression symptoms from the open label trial.

In regards to not citing Doss’s work, Carhart-Harris, Nutt, and Daws wrote to Motherboard that, “We regret not citing Doss’s paper as the work is relevant. However, this oversight was certainly not intentional. We simply had not read Doss’s paper and it was not flagged for our attention by our co-authors or reviewers of our Nature Medicine paper. In hindsight, we wish we had seen it, and we apologize to Doss and his fellow co-authors for the oversight.”

The depression scales used

In clinical trials, researchers decide which scales they’re going to use to measure outcomes before they start a study. They do this so there’s no bias in choosing a scale that makes the data look the best, the decision is made public ahead of time. For the randomized trial that compared psilocybin to an antidepressant, the researchers said they would use the Quick Inventory of Depressive Symptoms (QIDS) to measure people’s depression levels.

By the end of the study, according to this scale, there was not a significant difference in QIDS score between the psilocybin and antidepressant group. Much has been written about this (read some expert commentary here), but to sum up—it didn’t mean that psilocybin didn’t help at all. It meant that based on this scale, it didn’t meet the statistical threshold for being a better antidepressant compared to the antidepressant. On other scales, like the Beck Depression Inventory, or BDI, psilocybin was significantly better than an SSRI.

In the new Nature Medicine paper, the authors chose to use the BDI when evaluating the same people from the NEJM study, even though in the original study, the QIDS was the scale they used for their primary measurement. The authors wrote in the paper that the BDI was the “secondary outcome measure” for the initial study, and that they switched because “this measure proved to be an especially sensitive index of post-psilocybin reductions in symptom severity across the trials.” This move raised questions, both in the Doss comment, and elsewhere.

Man sits with his head in his hand at his computer, surrounded by psychedelic shapes.

The Carhart-Harris reply pushed back on those taking issue with their switch in depression scales. The authors wrote that while the QIDS was the pre-registered scale, they had used the BDI in the NEJM study too, and chose it for this new paper because it was more sensitive to psillocybin’s antidepressant effects. “Can the use of the BDI as the main depression outcomes measure used in the Nature Medicine paper to correlate with the core brain modularity data be justified? Yes, it can,” they wrote, emphasis in original.

They added to Motherboard that they are now looking into why the QIDS didn’t show statistical significance when psilocybin was compared to the SSRI, while other scales did. “The results are quite interesting and will be written up for publication,” they wrote. “In brief, more than other rating scales, the QIDS-SR-16 strays beyond a core dimension of depression that relates to depressed mood, and it also uses an atypical approach to scoring appetite and weight dimensions, that may affect its reliability.”

Regression to the mean, and measuring people who weren’t doing anything

The Doss comment also posed a question: Are there other explanations for the decrease in modularity, like regression to the mean? (This is a statistical phenomenon that describes how over time, measurements tend to drift back to center, or the mean.)

If we look at the graphs that show decreases in modularity from the paper, we can see that the mean modularity in the antidepressant group started out a bit lower than the psilocybin group. The fact that the psilocybin group showed more of a decrease might be because they were more extreme to begin with, for a number of reasons, including chance.

Because there was no placebo group, another group of people that got no intervention at all, it’s hard to understand what exactly is driving or not driving this change. The Carhart-Harris reply wrote that,”We cannot discount [regression to the mean] playing a role here but neither do we see any strong justification to suspect that it caused the main effect seen with psilocybin.”

Doss also told Motherboard that in the studies, people weren’t asked to do anything while their brains were being scanned—this is often called “resting state” fMRI. It can be tricky to interpret so-called “resting state” scanning data, Doss said.

Modularity is a very non-specific measurement, and increases or decreases could be a reflection of many different things, Doss explained. For instance, before doing psilocybin therapy, a person may have had a more calm or sleepy mind, whereas after they could have had racing thoughts.

“If you really wanted to test being stuck, you would give someone a task where they have to make decisions and they get stuck on their decisions,” Corlett said. “They do resting state imaging. And then they make these weird psychological inferences about the results that actually bear no relationship to the psychological processes that they’re invoking at all.”

In the Carhart-Harris reply, the authors responded, “It has become fashionable to critique analysis of spontaneous, task-free brain activity,” and that they don’t find this critique compelling. Further, they wrote that there can be other issues around drawing inferences from specific tasks, like the one Corlett mentioned.

Motherboard reached out to two of the Nature Medicine paper’s reviewers, Jared van Snellenberg and David Hellerstein, a professor of clinical psychiatry at Columbia University. Hellerstein did not comment on specifics of the paper and its review process, but van Snellenberg shared some of his thoughts.

“From a scientific perspective, I liked the paper quite a lot,” van Snellenberg said. “I thought it was methodologically sound, and to my mind, what this paper shows is a potential neural mechanism for the effects of psilocybin on depression or even just acutely on the brain. As always with these things, I would need to see this probably replicated twice by different labs to really be like, ‘Yes, this is happening.’”

Van Snellenberg agreed that there wasn’t anything in the data that definitively showed the mechanism by which psilocybin has antidepressant effects.”There’s no direct link there,” he said. “There’s an indirect link.”

“This is the kind of thing that you have to let be part of the scientific literature,” he said. “It’s an important finding, but it’s important because it’s something that the next person who does an fMRI study with psilocybin and for depression should be looking at to see if it’s still there.”

Close up of a soap bubble with psychedelic color on black background

Read Next

The False Promise of Psychedelic Utopia

In his initial review of the paper, van Snellenberg said his main comment was that the authors need to be clearer in their explanation of the rationale for switching the depression measurement used. But ultimately he said he was satisfied by the reason given: That the BDI was more responsive at capturing the treatment effects compared to the QIDS.

“For the purposes of the Nature Medicine paper, that’s a perfectly valid reason to switch and use it,” van Snellenberg said. It wouldn’t be okay for a randomized clinical trial, he added. But for a piece of “discovery science,” he thinks it’s okay.

When asked about the use of the one-tailed test however, van Snellenberg initially told Motherboard that there weren’t any one-tailed tests used in the paper. After going over the paper together, he saw the one-tailed tests used to test the correlation between brain modularity and depression outcomes.

He said he agreed with the Doss comment that a two-tailed test would have been better.

“They should probably not have done that,” van Snellenberg said. “I wish I’d picked up on that in the review. It should have been done two-tailed, and you just let the data be what they are.”

I asked McConway about his thoughts on other elements of the Carhart-Harris reply, like only doing the interaction test because a reviewer asked for it. “The trouble is, it should have been in their original analysis plan because it’s an essential part of the analysis they want to do,” McConway said. Because the paper does make suggestions about a potential mechanism of one compound compared to another, it’s not possible to just look at them separately.

“You can’t conclude that if a finding in one group of patients is significant and the other one isn’t, that those groups differ,” McConway said. “It’s just a standard thing that many people get wrong. It is wrong, there’s no other way about it. The critics are right. I’m leaning over backwards so as not to be inappropriately rude to them, but the original authors didn’t understand what they were doing in relation to this fairly basic statistical thing.”

Where does this leave us, if not with a statistics-induced headache? Hopefully with some lessons about study design, science communication, and, crucially, how to talk about complex findings that can be both intriguing and still preliminary.

In their statement to Motherboard, Carhart-Harris, Nutt, and Daws wrote that they agreed the phrasing of Nutt’s “proof” tweet was “over-stating the result within the limitations of a tweet.”

“David made this tweet based on the current finding supporting considerable other evidence of different brain actions of SSRIs compared with psychedelics,” they wrote. “We agree the specific paper referenced did not find ‘proof’ that psilocybin therapy decreases brain modularity to a significantly greater extent than 6 weeks of escitalopram.”

They also agreed they feel a cultural change occuring, where the public and other experts are beginning to question aspects of psychedelic medicine. “We welcome this correction to a prior trajectory of excessive hype,” they said. They added later, “Understanding these risks should help us mitigate them, so, again, I welcome the cultural correction, but hope it does not overshoot in the direction of excessive disillusionment.”

Corlett said he supports psychedelic research, but he just doesn’t want findings to be overstated at its early stages. He wants the work to be done rigorously, and be open to criticism. Doss added that scientists play an important role in this, when they communicate their findings and talk about their work.

Some psychedelic scientists have found themselves in the strange position of doing research, and simultaneously advocating for the legitimacy of the field—that it is worth federal funding or demands policy changes that improve access, for instance. Carhart-Harris, Nutt, and Daws told Motherboard, “We have a responsibility to venture out of our metaphorical labs and into the public domain, to communicate with the public at large so that they can understand what we are doing and finding. Personally, when I do this (RCH speaking), I try to present a balanced picture that is as close as possible to the science and doesn’t gloss over important nuance.”

When asked about how outsiders should feel watching scientists so vehemently disagree, they added, “Squabbling online is never a good look—especially over a topic as intrinsically uncertain as statistics. P values are just likelihood estimates, not exact facts. A P value of <0.05 doesn’t make a finding true nor one of p<0.055 make a finding false. All these values do is to predict the likelihood of the findings being by chance.”

But what ended up happening is exactly that: A squabble that took place on Twitter and on the preprint server that wasn’t particularly edifying for a lay observer. These commentaries and replies should be visible to the public, but ideally in a manner where there can be good-faith deliberations around study design, and efforts made to help the public understand what’s being discussed. Drama and squabbling are entertaining, but don’t amount to much in the end.

Ultimately, over the past week, it wasn’t an explanation of what an interaction effect is but the tone of the Carhart-Harris reply that prompted many reactions. Corlett said that “usually in these sorts of exchanges when the criticized respond, I feel assuaged and we can maybe come closer to a consilient understanding,” but he didn’t feel that way after reading the Carhart-Harris’s reply. “The ad hominem attacks and appeals to their authority undermine their credibility even further,” Corlett said.

For instance, the Carhart-Harris reply wrote: “We question whether Doss et al. are being consistent with their own prior work regarding refraining from highlighting inter-group differences in the absence of interaction effects. It would not be hard for us to identify cases of sub-optimal rigor in their own work, but such tit-for-tat would be petty, and could endure, and cause distraction, so we will refrain from engaging in it. Instead, we repeat that Doss et al. present an air of superior scientific rigor that is unjustified and misleading. We invite you to reflect, what is Doss et al.’s ‘real’ motivation in seeking to portray weaknesses in the analytical methods reported in our Nature Medicine paper?”

Psychedelics touch on some of the most difficult areas of science: the interior states of the mind, mental disorders, interpreting brain imaging, and more. In such early days, science should be iterative and use what’s come before it to strengthen hypotheses and further refine them. It may very well be that decreased modularity or neural flexibility plays a role in how psilocybin acts as an antidepressant. This can be studied, without making definitive claims yet. “There’s a lot of subtlety to this which goes beyond whether this stuff works or it doesn’t,” McConway said.

The other reviewer of the Nature Medicine paper, Hellerstein, said, speaking more generally about psychedelic research and its reception by other scientists and the public, “I think of it as the rubber hitting the road kind of moment,” he said. “[Psychedelics] have stoked this kind of almost frenzy that just has to be disappointing. There’s no way that these drugs can meet the fantasized expectations that people have.”

Hellerstein sees the new wave of critical thinking as a positive move for the field, and thinks that all the new work coming out has value, in that we can build on it for the future. “I view these small studies using neuroimaging and other biological measures as hypothesis generating,” he said—useful, as long as we can view and talk about them in that way.

Carhart-Harris tweeted later in the day their reply went up, “One of my least favorite things: professional spats. I also dislike online trolling and echo-chambering via social media driving polarization. I like pluralism and wisdom teaching & am keen to step back from this forum for a while and focus more on family, mindfulness & metta…if I haven’t, I’ll try to keep learning. Peace, love and science.”

Follow Shayla Love on Twitter.