How well can I predict task performance from predictor variables?
2 views (last 30 days)
Show older comments
I have the following research question: I would like to predict the performance in a response time experiment (participants have to respond as fast as possible to a target stimulus) from three neural measures: Amplitude of an EEG signal, speed of a saccade (eye movement), and activity in a specific brain area as measured with fMRI.
What I have is a matrix with 5 columns: participant ID, EEG, saccade, fMRI, response time. The first column is just to identify the participants, columns 2-4 are predictor variables and the fifth column is the to-be-predicted variable.
Here are the actual questions: What would be a good way of testing how well I can predict the task performance? A regression I assume? Which function in MATLAB would you recommend? Does it make sense to segment the participants before running the regression?
dpb on 7 May 2021
We have absolutely no way to answer your question having no knowledge of the experiment.
Rightfully, the analysis methods would have been picked first and then the experiment designed and executed so as to be able to estimate the parameters of the model.
See a G. E. Box white paper that outlines some of the possible problems here <Regression Analysis Applied to Happenstance Data>
Given you already have the data and likely can't repeat the experiment, one must do what one can to at least be aware of potential issues unless the data were taken under well-controlled circumstances.
As for the last Q? specifically, "maybe"; your independent variables are markedly lacking in such information as age, sex, health status, etc., etc., etc., ... all of which may be the easily thought of confounding variables of which Professor Box speaks, not to mention less obvious but maybe even more important to the results of things like amount and quality of sleep the night before, etc.,
Scott MacKenzie on 7 May 2021
Edited: Scott MacKenzie on 8 May 2021
If you want a prediction equation expressing RT as a linear function of "amplitude of EEG signal", "speed of saccade", and "fMRI brain activity" and you've already collected the data, this is doable. Of course, whose to say the relationship with each of these variables is linear. But, that's another story (see dpb's comment).
The following code with fake data for 10 participants demonstrates the mechanics of building such a model. And it should get you thinking about your goals.
eeg = rand(10,1);
saccade = rand(10,1);
fmri = rand(10,1);
rt = rand(10,1);
data = [ones(size(eeg)) eeg saccade fmri];
[b, ~, ~, ~, stats] = regress(rt, data)
0.3729 1.1893 0.39018 0.12236
The prediction equation is
rt = 0.861 - 0.161 x eeg - 0.581 x saccade x 0.034 x fmri
with R^2 = 0.3729.
I suggest you read the documenation on the regress function and study the examples. Good luck.
dpb on 10 May 2021
Edited: dpb on 10 May 2021
I don't have time to do much right now; I am interested and will try to get back later -- just one observation to emphasize what was said before -- "R-sq isn't the tell-all, end-all" to evaluate a model.
" I also tried nonlinear apporaches:lm = fitlm([EEG saccade fMRI],rt,'quadratic') and get an even better R^2 ..."
- A quadratic surface is still a linear model; just higher order;
- Of course you get a higher R-sq, you've added six (6) additional terms and reduced the residual numbers of DOF by that many as well.
You seemingly still haven't looked at the model nor the data itself, though...the "exploratory" part --
>> mdl=fitlm([EEG saccade, fMRI],rt)
Linear regression model:
y ~ 1 + x1 + x2 + x3
Estimate SE tStat pValue
________ ______ ________ __________
(Intercept) 464.76 8.2759 56.158 2.0735e-40
x1 18.336 2.3883 7.6777 1.8516e-09
x2 0.11507 3.0848 0.037303 0.97042
x3 37.777 3.1301 12.069 4.4618e-15
Number of observations: 45, Error degrees of freedom: 41
Root Mean Squared Error: 18.3
R-squared: 0.939, Adjusted R-Squared: 0.935
F-statistic vs. constant model: 211, p-value = 6.17e-25
NB: that coefficient x2 ~ saccade has a SE (standard error of estimate) that is ~30X the magnitude of the coefficient -- IOW, it is meaningless as that says the coefficient is ~0.1 +/- 3 -- or anywhere between [-2.9, 3.1].
So, to interpret this model more accurately, it's really the same thing as
>> fitlm([EEG, fMRI],rt)
Linear regression model:
y ~ 1 + x1 + x2
Estimate SE tStat pValue
________ ______ ______ __________
(Intercept) 464.85 7.7969 59.62 3.197e-42
x1 18.367 2.2196 8.2748 2.3207e-10
x2 37.852 2.3729 15.951 1.9796e-19
Number of observations: 45, Error degrees of freedom: 42
Root Mean Squared Error: 18.1
R-squared: 0.939, Adjusted R-Squared: 0.936
F-statistic vs. constant model: 324, p-value = 3.02e-26
which actually is just slightly better with fewer terms -- RMSE 18.1 vs 18.3
"Everything should be a simple as possible, but not simpler." -- Einstein
Goes for model-building as well as physics.
This doesn't even start on residuals analyses, etc., etc., etc., ...
Find more on EEG/MEG/ECoG in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!