How well can I predict task performance from predictor variables?

Question

Toby Feld el 7 de Mayo de 2021

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/824450-how-well-can-i-predict-task-performance-from-predictor-variables

Editada: dpb el 10 de Mayo de 2021

Hello,

I have the following research question: I would like to predict the performance in a response time experiment (participants have to respond as fast as possible to a target stimulus) from three neural measures: Amplitude of an EEG signal, speed of a saccade (eye movement), and activity in a specific brain area as measured with fMRI.

What I have is a matrix with 5 columns: participant ID, EEG, saccade, fMRI, response time. The first column is just to identify the participants, columns 2-4 are predictor variables and the fifth column is the to-be-predicted variable.

Here are the actual questions: What would be a good way of testing how well I can predict the task performance? A regression I assume? Which function in MATLAB would you recommend? Does it make sense to segment the participants before running the regression?

Thanks,

Tim

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

dpb el 7 de Mayo de 2021

We have absolutely no way to answer your question having no knowledge of the experiment.

Rightfully, the analysis methods would have been picked first and then the experiment designed and executed so as to be able to estimate the parameters of the model.

See a G. E. Box white paper that outlines some of the possible problems here <Regression Analysis Applied to Happenstance Data>

Given you already have the data and likely can't repeat the experiment, one must do what one can to at least be aware of potential issues unless the data were taken under well-controlled circumstances.

As for the last Q? specifically, "maybe"; your independent variables are markedly lacking in such information as age, sex, health status, etc., etc., etc., ... all of which may be the easily thought of confounding variables of which Professor Box speaks, not to mention less obvious but maybe even more important to the results of things like amount and quality of sleep the night before, etc.,

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Scott MacKenzie el 7 de Mayo de 2021

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/824450-how-well-can-i-predict-task-performance-from-predictor-variables#answer_694750

Editada: Scott MacKenzie el 8 de Mayo de 2021

Abrir en MATLAB Online

If you want a prediction equation expressing RT as a linear function of "amplitude of EEG signal", "speed of saccade", and "fMRI brain activity" and you've already collected the data, this is doable. Of course, whose to say the relationship with each of these variables is linear. But, that's another story (see dpb's comment).

The following code with fake data for 10 participants demonstrates the mechanics of building such a model. And it should get you thinking about your goals.

eeg = rand(10,1);
saccade = rand(10,1);
fmri = rand(10,1);
rt = rand(10,1);
data = [ones(size(eeg)) eeg saccade fmri];
[b, ~, ~, ~, stats] = regress(rt, data)

Output:

b =

0.86189

-0.16142

-0.58097

0.034387

stats =

0.3729 1.1893 0.39018 0.12236

The prediction equation is

rt = 0.861 - 0.161 x eeg - 0.581 x saccade x 0.034 x fmri

with R^2 = 0.3729.

I suggest you read the documenation on the regress function and study the examples. Good luck.

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Toby Feld el 8 de Mayo de 2021

Abrir en MATLAB Online

You both made some excellent points. I do indeed not want to fish around for good model fits, I want to do a cross validation. For example something like this:

% Run regression on simulated data
iterations = 100; % number of iterations
subjectTotal = 1000; % number of simulated subjects
subjectTest = 500; % subjects used for testing
% Create data
eeg = rand(subjectTotal,1);
saccade = rand(subjectTotal,1);
fmri = rand(subjectTotal,1);
rt = rand(subjectTotal,1);
% Combine data into matrix and add ones
data = [ones(size(eeg)) eeg saccade fmri];
fit = [];
for i = 1:iterations
    shuffledSubjects = randperm(subjectTotal); % shuffle subjects
    subjectsTrain = shuffledSubjects(1:subjectTest); % select sample for training
    subjectsTest = shuffledSubjects(subjectTest+1:end); % select remaining subjects as sample for testing
    X1 = data(subjectsTrain,:); % predictive variables training
    X2 = data(subjectsTest,:); % predictive variables testing
    Y1 = rt(subjectsTrain,:); % predicted variable training
    Y2 = rt(subjectsTest,:); % predicted variable testing
    [b, ~, R, ~, stats] = regress(Y1,X1); % run regression model
    estimates = X2*b; % using weights from regression model, estimate values for test data
    ResTest = Y2-estimates; % calculate the residuals
    fit(i,1) = (mean(ResTest.^2)); % use square (to punish for strong deviations) and average
end
mean(fit) % average fit across iterations (the smaller the better the fit)

It's quite interesting how the model fit changes with different samples for training and testing. If I use 900 out of 1000 for training I get better fits than for 100 or so. Of course that only works for real data as random data shouldn't allow any fit. Does this make sense? Thanks!

dpb el 8 de Mayo de 2021

" I do indeed not want to fish around for good model fits,..."

Again, without some basis for a model, "correlation does not imply causation" so that really is all you are doing whether it is one term or a hundred.

The idea in general is as noted before; one starts with some hypothesis and tries to design and execute an experiment to prove/disprove the hypothesis.

Simply collecting data and making some sort of empirical fit is only that...we don't even know how many subjects there were, besides the other potential issues I see in the sample population that are unmeasured.

Since this is not a case where you can set the level of one of the hopeful predictor variables to measure a response but all variables are responses, without knowing something about what kind of range there is in those, it's not even clear there's a reason to fit one or more of the variables.

IMO, there's just too much unknown to us here to be willing to make any recommendations whatever...

Have you done any visualization of the data?

Scott MacKenzie el 10 de Mayo de 2021

I'm not sure. Perhaps dpb will have some ideas to offer.

dpb el 10 de Mayo de 2021

Editada: dpb el 10 de Mayo de 2021

Abrir en MATLAB Online

I don't have time to do much right now; I am interested and will try to get back later -- just one observation to emphasize what was said before -- "R-sq isn't the tell-all, end-all" to evaluate a model.

" I also tried nonlinear apporaches:lm = fitlm([EEG saccade fMRI],rt,'quadratic') and get an even better R^2 ..."

A quadratic surface is still a linear model; just higher order;
Of course you get a higher R-sq, you've added six (6) additional terms and reduced the residual numbers of DOF by that many as well.

You seemingly still haven't looked at the model nor the data itself, though...the "exploratory" part --

>> mdl=fitlm([EEG saccade, fMRI],rt)
mdl = 
Linear regression model:
    y ~ 1 + x1 + x2 + x3
Estimated Coefficients:
                   Estimate      SE       tStat        pValue  
                   ________    ______    ________    __________
    (Intercept)     464.76     8.2759      56.158    2.0735e-40
    x1              18.336     2.3883      7.6777    1.8516e-09
    x2             0.11507     3.0848    0.037303       0.97042
    x3              37.777     3.1301      12.069    4.4618e-15
Number of observations: 45, Error degrees of freedom: 41
Root Mean Squared Error: 18.3
R-squared: 0.939,  Adjusted R-Squared: 0.935
F-statistic vs. constant model: 211, p-value = 6.17e-25
>> 

NB: that coefficient x2 ~ saccade has a SE (standard error of estimate) that is ~30X the magnitude of the coefficient -- IOW, it is meaningless as that says the coefficient is ~0.1 +/- 3 -- or anywhere between [-2.9, 3.1].

So, to interpret this model more accurately, it's really the same thing as

>> fitlm([EEG, fMRI],rt)
ans = 
Linear regression model:
    y ~ 1 + x1 + x2
Estimated Coefficients:
                   Estimate      SE      tStat       pValue  
                   ________    ______    ______    __________
    (Intercept)     464.85     7.7969     59.62     3.197e-42
    x1              18.367     2.2196    8.2748    2.3207e-10
    x2              37.852     2.3729    15.951    1.9796e-19
Number of observations: 45, Error degrees of freedom: 42
Root Mean Squared Error: 18.1
R-squared: 0.939,  Adjusted R-Squared: 0.936
F-statistic vs. constant model: 324, p-value = 3.02e-26
>> 

which actually is just slightly better with fewer terms -- RMSE 18.1 vs 18.3

"Everything should be a simple as possible, but not simpler." -- Einstein

Goes for model-building as well as physics.

This doesn't even start on residuals analyses, etc., etc., etc., ...

Iniciar sesión para comentar.

How well can I predict task performance from predictor variables?

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuestas (1)

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

How well can I predict task performance from predictor variables?

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuestas (1)

12 comentarios Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

12 comentarios
Mostrar 10 comentarios más antiguosOcultar 10 comentarios más antiguos