Exercise 13: Resampling methods#
This homework assignment is designed to give you practice with bootstrapping and permutation tests.
You will need to download the unrestricted_trimmed_1_7_2020_10_50_44.csv file from the Homework/hcp_data folder in the class GitHub repository.
This data is a portion of the Human Connectome Project database. It provides measures of cognitive tasks and brain morphology measurements from 1206 participants. The full description of each variable is provided in the HCP_S1200_DataDictionary_April_20_2018.csv file in the Homework/hcp_data folder in the class GitHub repository.
1. Loading & Visualizing the Data (1 point)#
Use the setwd and read.csv functions to load data from the unrestricted_trimmed_1_7_2020_10_50_44.csv file.
(a) Using the tidyverse tools, create a new dataframe d1 that only includes the subject ID (Subject), gender (Gender, self reported at time of data collection), Flanker Task performance (Flanker_Unadj), total intracranial volume (FS_IntraCranial_Vol), total white matter volume (FS_Tot_WM_Vol), and total grey matter volume (FS_Total_GM_Vol) variables and remove all na values.
Use the head function to look at the first few rows of each data frame.
# WRITE YOUR CODE HERE
(b) Plot grey matter volume (x axis) against intracranial volume (y axis) and Gender (point color).
# WRITE YOUR CODE HERE
What patterns do you observe in the scatter plot?
Write your response here
2. Logistic classifier (2 points)#
We want to try predicting gender using the neural data you have loaded.
(a) Run a logistic regression model to predict gender from total white matter volume, total grey matter volume, and intracranial volume.
# WRITE YOUR CODE HERE
Which factors are significantly associated with gender?
Write your response here
(b) Estimate the prediction accuracy of your model (Note: this is the training set accuracy). Set your prediction threshold to 0.5.
# WRITE YOUR CODE HERE
What is the prediction accuracy for gender from the full model?
Write your response here
3. Bootstrapped accuracy (3 points)#
Use bootstrapping to estimate the confidence interval for your model’s prediction accuracy. Plot a histogram of the bootstrapped accuracies, and estimate the confidence interval using the standard deviation of the bootstrap distribution.
# WRITE YOUR CODE HERE
How robust is the prediction accuracy of the full model?
Write your response here
4. Permutation test for grey matter effects (3 points)#
Now run a permutation test, with 1000 iterations, to evaluate how much grey matter volume contributes to the prediction accuracy. Compare the prediction accuracy of the full (unpermuted) model with the distribution of accuracies obtained when the grey matter volume variable is permuted, using a histogram (Hint: use the abline function to show the original accuracy on the histogram).
# WRITE YOUR CODE HERE
How much does the grey matter volume influence the prediction accuracy of the model?
Write your response here
5. Reflection (1 point)#
Differentiate the bootstrap from a permutation test. Describe each and explain when each is appropriate.
Write your response here
DUE: 11:59pm EST, March 24, 2026
IMPORTANT Did you collaborate with anyone on this assignment? If so, list their names here.
Someone’s Name
GenAI Utilization Did you utilize any generative AI tools on this assignment? If so, please list the item and the paste respective prompt you used.