Stanford University
A search for “fMRI” for FY2020 NIH grants finds:
1218 matching grants
$552M total costs
What is the brain dysfunction in major depression?
Meta-analysis of 99 published studies
Muller et al, 2017, JAMA Psychiatry
We seem to have created quite a mess.
How can we fix it?
“Sunlight is said to be the best disinfectant”
(Louis Brandeis)
82 | NATURE | VOL 526 | 8 OCTOBER 2015
Positive Predictive Value (PPV): The probability that a positive result is true
Winner’s Curse: overestimation of effect sizes for significant results
Button et al, 2013
Schonbrodt & Perugini, 2013
Marek et al., 2022
Jason stein et al. for the ENIGMA Consortium
“In general, previously identified polymorphisms associated with hippocampal volume showed little association in our meta-analysis (BDNF, TOMM40, CLU, PICALM, ZNF804A, COMT, DISC1, NRG1, DTNBP1), nor did SNPs previously associated with schizophrenia or bipolar disorder”
Updated from Poldrack et al., 2017
Unbiased effect size estimate
Poldrack et al., 2017, Nature Reviews Neuroscience
Authors must collect at last 20 observations per cell or else provide a compelling cost-of-data-collection justification. This requirement offers extra protection for the first requirement. Samples smaller than 20 per cell are simply not powerful enough to detect most effects, and so there is usually no good reason to decide in advance to collect such a small number of observations. Smaller samples, it follows, are much more likely to reflect interim data analysis and a flexible termination rule (Simmons et al., 2011)
Varoquaux, 2018
Varoquaux, 2018
Poldrack et al., 2017
“data collection and analysis methods were highly flexible across studies, with nearly as many unique analysis pipelines as there were studies in the sample [241].”
“In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways.”
The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant but bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments.
http://www.russpoldrack.org/2016/09/why-preregistration-no-longer-makes-me.html
Kaplan & Irvin, 2015
Pre-registration prevents p-hacking but does not eliminate analytic variability
How variable are neuroimaging analysis workflows in the wild?
What is the effect on scientific inferences?
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
Proportion of teams with activity in each voxel
Maximum overlap for all hypotheses: 76%
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
Botvinik-Nezer et al., 2020, Nature
How many of you have written computer code in the course of your research?
How many of you have been trained in software engineering?
How many of you have ever written a test for your code?
# 23-class classification problem
skf=StratifiedKFold(labels,8)
if trainsvm:
pred=N.zeros(len(labels))
for train,test in skf:
clf=LinearSVC()
clf.fit(data[train],labels[train])
pred[test]=clf.predict(data[test])
Results: 93% accuracy
http://www.russpoldrack.org/2013/02/anatomy-of-coding-error.html
# 23-class classification problem
skf=StratifiedKFold(labels,8)
if trainsvm:
pred=N.zeros(len(labels))
for train,test in skf:
clf=LinearSVC()
clf.fit(data[train],labels[train])
pred[test]=clf.predict(data[test])
Results: 93% accuracy
http://www.russpoldrack.org/2013/02/anatomy-of-coding-error.html
data[:,train]
data[:,test]
Results: 53% accuracy
http://reproducibility.stanford.edu/coding-error-postmortem/
http://reproducibility.stanford.edu/coding-error-postmortem/
http://reproducibility.stanford.edu/coding-error-postmortem/
http://www.russpoldrack.org/2016/08/the-principle-of-assumed-error.html
https://software-carpentry.org
https://github.com/poldrack/pytest_tutorial
The dataset included two notable age outliers (reported ages 5 and 32757).
Specifically, the statement on page 9 “age turned out not to correlate with any of the indicator variables” is incorrect. It should read instead “age correlated significantly with 3 latent indicator variables (Vaccinations: .219, p < .0001; Conservatism: .169, p < .001; Conspiracist ideation: -.140, maximum likelihood p < .0001, bootstrapped p = .004), and straddled significance for a fourth (Free Market: .08, p%.05).”
In [1]: age=32757
In [2]: assert age>12 and age<120
------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-2-37de876b5fda> in <module>()
----> 1 assert age>12 and age<120
AssertionError:
https://www.browserstack.com/guide/tdd-vs-bdd-vs-atdd
https://poldrack.github.io/talks-Neurohackademy/