Exercise 7: Linear models#
This homework assignment is designed to give you practice with linear models and the bias-variance tradeoff.
You will need to download the unrestricted_trimmed_1_7_2020_10_50_44.csv file from the Homework/hcp_data folder in the class GitHub repository.
This data is a portion of the Human Connectome Project database. It provides measures of cognitive tasks and brain morphology measuresments from 1206 participants. The full description of each variable is provided in the HCP_S1200_DataDictionary_April_20_2018.csv file in the Homework datasets/hcp_data folder in the class GitHub repository.
1. Loading the Data (1 point)#
Use the setwd and read.csv functions to load data from the unrestricted_trimmed_1_7_2020_10_50_44.csv file.
Using the tidyverse tools, make a new dataframe d1 that only inclues the subject ID (Subject), gender (Gender), Flanker Task performance (Flanker_Unadj), total white matter volume (FS_Tot_WM_Vol), and total grey matter volume (FS_Total_GM_Vol) variables and remove all na values.
Use the head function to look at the first few rows of each data frame.
# If you are running this on your local computer, wet your workign directory to
# the location of the lexDat data by setting your harddrive. Uncomment this line
# and change the location to where it is on your computer.
#setwd("~/Documents/PittCMU/G3/DSPN/DataSciencePsychNeuro/Homeworks/hcp_data")
# If you are running this on Colab, then use something like this.
# system("gdown --id 1hywRmGdvhbDYTrQRyl1_bLJsq-T3GJq2")
# INSERT CODE HERE
2. Initial data visualization (2 point)#
Use the pairs function to look at all the pairwise scatterplots of the variables in d1. Describe which variables seem positively correlated, negatively correlated, or not correlated at all.
#INSERT CODE HERE
Write your response here.
3. Linear regression (4 points)#
Use the lm (linear model) function to determine the association between Flanker Task performance and total grey matter volume from the d1 data frame.
Show the results using the summary function, and report the mean coefficient values for \(beta_0\) & \(\beta_1\) (coef function) and their 95% confidence intervals (confint function). Is grey matter volume significantly associated with Flanker Task performance?
If you use Generative AI (optional)#
If you use code-assisting generative AI (e.g., Copilot, GPT, Claude; more than simple google searching), follow the steps below. If you do not use generative AI, skip this section and go directly to the coding part.
Step 1: Write pseudocode#
Write pseudocode for lm model (or any other variables you have to print) below.
# Write out your pseudo code here
Step 2: Generate code using generative AI#
Using your pseudocode, prompt a generative AI to create or fix the R function. Then paste only the generated function below.
# Paste generated function here
Step 3: Compare and verify the code#
Question 1. What is different between your pseudocode and generated code. (e.g., it handled NAs differently, used a different formula arrangement, etc.)
Question 2. What are the input parameters and the output of the function? (Answer to show you understand what the function takes in and returns. Be concise.)
Question 3. Does anything has to be changed? (Yes / No)
Write your response here.
Q1.
Q2.
Q3. Yes / No
If you answers yes to the question 3, revise the code. Don’t forget to paste your final version in the code block below.
#INSERT CODE HERE
Write your response here.
4. Plotting (2 points)#
Use ggplot to plot the FS_Total_GM_Vol variable (x axis) against the Flanker_Unadj variable (y axis), as well as the regression line with confidence intervals on the regrssion line. Qualitatively describe what you see.
#INSERT CODE HERE
Write your response here.
5. Reflection (1 point)#
What do you conclude based on the analyses above?
Write your response here.
DUE: 11:59pm EST, Feb 19, 2026
IMPORTANT Did you collaborate with anyone on this assignment? If so, list their names here.
Someone’s Name