Irreproducible Results (2011)

(kk.org)

55 points | by fsagx 8 months ago ago

31 comments

rcxdude 8 months ago

Biology experiments are notoriously sensitive: even fairly standard protocols can be wildly unreliable or unpredictable. I've heard of at least one instance where a lab worked out that for one protocol, the path they took when carrying the sample from one room to another mattered (one stairwell meant it didn't work, the other meant it did). Even in much simpler systems you get strange effects like disappearing polymorphs (https://en.wikipedia.org/wiki/Disappearing_polymorph)

[-]

geysersam 8 months ago

Did they figure out what the mechanism was for the difference? Or might that also prove to be a spurious correlation?

NeuroCoder 8 months ago

I had a neuroscience professor in undergrad who did a bunch of experiments where the only variables were things like the material of the cage, bedding, feeder, etc. He systematically tested variations in each separately. Outcomes varied in mice no matter what was changed. I would love to tell you what outcomes he measured, but it convinced me not to go into mice research so it's all just a distant memory.

On the other hand, I've worked with people since then who have their own mice studies going on. We are always learning new ways to improve the situation. It's just not a very impressive front page so it goes unnoticed by those not into mice research methods.

[-]

stonethrowaway 8 months ago

Funny, considering majority of trials posted on the front page end up being studies done on mice.

tomcam 8 months ago

The implications of the work done by your former professor are so profound I can hardly get my arms around them.

stogot 8 months ago

Are you able to find if your professor published any of that information?

[-]

NeuroCoder 8 months ago

I wish I could, but in addition to this happening over a decade ago, he changed his lab's focus afterwards. He went into neurotransmitter research in skin since it has some overlapping embryological origins with the brain.

ChadNauseam 8 months ago

I like a suggestion I read from Eliezer Yudkowsky - journals should accept or reject papers based on the experiment's preregistration, not based on the results

[-]

setgree 8 months ago

This is called a Registered Report [0] but it doesn't suffice for (computational) reproducibility [1]

[0] https://www.nature.com/articles/s41562-021-01193-7

[1] https://journals.sagepub.com/doi/full/10.1177/25152459209188...

8 months ago

[deleted]

nextos 8 months ago

You can see this is a problem if you mine out the distribution of p-values from articles.

Andrew Gelman had a great post on this topic I can't find now.

Pre-registration could be a great solution. Negative results are also important.

krisoft 8 months ago

I don't understand what is so disturbing about the Crabbe test. They injected mouse with cocaine and they observed that the mouse was moving more than normal. They different in how much more. But why would they expect that the extra movement be constant and consistent?

Now if one set of mouse moved more, while an other started blowing orange soap bubbles from their ears that would be disturbing. But just that the average differed? Maybe I should read the paper in question.

[-]

casualrandomcom 8 months ago

At first I thought you were not getting it, but, thinking it through, I now think the real problem is that the article gave us the averages (600, 701, 5000) without giving the standard deviations and nobody is outraged!

The combined result of the three experiments can be either surprising or absolutely obvious: if the standard deviation of each of the three experiments was around 1 cm, it would be troubling, if it was 100 cm, it would be troublesome yet, but if the standard deviation is 5000 cm, there would be nothing wrong in what happened.

[-]

krisoft 8 months ago

> without giving the standard deviations and nobody is outraged!

I agree with that! I was just swallowing my outrage :D

> if the standard deviation of each of the three experiments was around 1 cm, it would be troubling

That would be very curious though! These are animals not robots :D The only way I could imagine them to average that small standard deviation in the distance moved if we paralyse (or almost paralyse) them.

smitty1e 8 months ago

Mandatory salute to JIR => https://web.archive.org/web/20190901210011/http://jir.com/

necovek 8 months ago

This is extremely interesting.

On top of keeping and publishing "negative outcomes", could we also move to actually requiring verification and validation by another "lab" (or really, an experiment done in different conditions)?

[-]

tomcam 8 months ago

I love that idea, but it would never work in practice. Some thoughts:

* Funding for any experiment would have to include 100% extra because presumably every experiment done would also have to duplicate another, randomly chosen experiment. The situation would be become something akin to lawyers being required to do pro bono work. It would mean that the randomly chosen experiment to be duplicated would require a different set of skills than the primary experiment.

* Assuming the above, there would be an extremely high impedance in communications between any two of these experiments because no one could really describe their experiment in a way that would allow independence recreation of it.

* Smaller institutions would struggle to re-create experiments from better funded institutions.

* Getting the second experiment funded would always be difficult because you probably wouldn’t be able to go to the same sources.

[-]

naasking 8 months ago

> Funding for any experiment would have to include 100% extra because presumably every experiment done would also have to duplicate another, randomly chosen experiment.

If this were a universal policy then we'd be no worse off because everyone would face the same challenges.

> Smaller institutions would struggle to re-create experiments from better funded institutions.

That's already the case.

> Getting the second experiment funded would always be difficult because you probably wouldn’t be able to go to the same sources.

I thought we were discussing how the original experiment's funding already included funding for the replication?

analog31 8 months ago

My experiment was built in a small accelerator lab, plus about $400k in 1990 dollars. Technology was evolving rapidly (lasers, computers), and some of the critical gear was obsolete by the time I finished. Had I needed to secure guaranteed funding for replication, I would not have started the experiment. How it was eventually replicated could not have been known at that point.

The thing that comes to mind in this thread is that rules (and rule makers) for small biological and behavioral studies might not make sense for a physics research program, and vice versa.

begueradj 8 months ago

With that in mind, how something like medication could even exist then ?

[-]

vertnerd 8 months ago

They often don't. Consider phenylephrine, the OTC replacement ingredient for the original Sudafed formula. If you ever felt like it didn't do a damned thing for nasal congestion, then you'd be right. https://www.nbcnews.com/health/health-news/fda-panel-says-co...

mistermann 8 months ago

https://en.m.wikipedia.org/wiki/Closed-world_assumption

pazimzadeh 8 months ago

>> [John Crabbe] performed a series of experiments on mouse behavior in three different science labs: in Albany, New York; Edmonton, Alberta; and Portland, Oregon. Before he conducted the experiments, he tried to standardize every variable he could think of. The same strains of mice were used in each lab, shipped on the same day from the same supplier. The animals were raised in the same kind of enclosure, with the same brand of sawdust bedding. They had been exposed to the same amount of incandescent light, were living with the same number of littermates, and were fed the exact same type of chow pellets. When the mice were handled, it was with the same kind of surgical glove, and when they were tested it was on the same equipment, at the same time in the morning.

>> The premise of this test of replicability, of course, is that each of the labs should have generated the same pattern of results. “If any set of experiments should have passed the test, it should have been ours,” Crabbe says. “But that’s not the way it turned out.” In one experiment, Crabbe injected a particular strain of mouse with cocaine. In Portland the mice given the drug moved, on average, six hundred centimetres more than they normally did; in Albany they moved seven hundred and one additional centimetres. But in the Edmonton lab they moved more than five thousand additional centimetres. Similar deviations were observed in a test of anxiety. Furthermore, these inconsistencies didn’t follow any detectable pattern. In Portland one strain of mouse proved most anxious, while in Albany another strain won that distinction.

>> The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise.

This wasn't established when the post was written, but mice are sensitive and can align themselves to magnetic fields so if the output is movement the result is not thaaaat surprising. There are a lot of things that can affect mouse behavior, including possibly pheromones/smell of the experimenter. I am guessing that behavior patterns such as anxiety behavior can be socially reinforced as well, which could affect results. I can could come up with another dozen factors if I had to. Were mice tested one at a time? How many mice were tested? Time of day? Gut microbiota? If the effect isn't reproducible without the sun and moon lining up, then it could just a 'weak' effect that can be masked or enhanced by other factors. That doesn't mean it's not real, but that the underlying mechanism is unclear. Their experiment reminds me of the rat park experiment, which apparently did not always reproduce, but doesn't mean the effect isn't real in some conditions: https://en.wikipedia.org/wiki/Rat_Park.

I think the idea of publishing negative results is a great one. There are already "journals of negative results". However, for each negative result you could also make the case that some small but important experimental detail is the reason why the result is negative. So negative results have to be repeatable too. Otherwise, no one would have time to read all of the negative results that are being generated. And it would probably be a bad idea to not try an experiment just because someone else tried it before and got a negative result once.

Either way, researchers aren't incentivized to do that. You don't get more points on your grant submission for publishing negative results, unless you also found some neat positive results in the process.

[-]

lmm 8 months ago

> There are a lot of things that can affect mouse behavior, including possibly pheromones/smell of the experimenter. I am guessing that behavior patterns such as anxiety behavior can be socially reinforced as well, which could affect results. I can could come up with another dozen factors if I had to. Were mice tested one at a time? How many mice were tested? Time of day? Gut microbiota? If the effect isn't reproducible without the sun and moon lining up, then it could just a 'weak' effect that can be masked or enhanced by other factors. That doesn't mean it's not real, but that the underlying mechanism is unclear.

I think it does mean the claimed causal link is not real, or at least not proven. Certainly if the error bars from two "reproductions" of the same experiment do not overlap, you can't and mustn't really say that the experiment found anything.

11101010001100 8 months ago

Ironically, some of Jonah Lehrer's work is fabricated.

emmelaich 8 months ago

(2011)

[-]

dang 8 months ago

Added. Thanks!

tonetheman 8 months ago

[dead]

stonethrowaway 8 months ago

[flagged]

[-]

dang 8 months ago

Can you please not break the site guidelines like this? We want curious conversation here.

https://news.ycombinator.com/newsguidelines.html

emmelaich 8 months ago

It's not clear to me whether this 'current' refers to. The old or new?

Agreed that nothing has changed though. Un-reproduced experiments have always been dubious.