Stack or Unstack Dataset Arrays
This example shows how to reformat dataset
arrays using stack
and unstack
.
Load sample data.
Import the data from the comma-separated text file
testScores.csv
.
ds = dataset('File','testScores.csv','Delimiter',',')
ds = LastName School Test1 Test2 Test3 {'Jeong' } {'XYZ School'} 90 87 93 {'Collins' } {'XYZ School'} 87 85 83 {'Torres' } {'XYZ School'} 86 85 88 {'Phillips'} {'ABC School'} 75 80 72 {'Ling' } {'ABC School'} 89 86 87 {'Ramirez' } {'ABC School'} 96 92 98 {'Lee' } {'XYZ School'} 78 75 77 {'Walker' } {'ABC School'} 91 94 92 {'Garcia' } {'ABC School'} 86 83 85 {'Chang' } {'XYZ School'} 79 76 82
Each of the 10 students has 3 test scores.
Perform calculations on dataset array.
With the data in this format, you can, for example, calculate the average test score for each student. The test scores are in columns 3 to 5.
ds.TestAve = mean(double(ds(:,3:5)),2); ds(:,{'LastName','School','TestAve'})
ans = LastName School TestAve {'Jeong' } {'XYZ School'} 90 {'Collins' } {'XYZ School'} 85 {'Torres' } {'XYZ School'} 86.333 {'Phillips'} {'ABC School'} 75.667 {'Ling' } {'ABC School'} 87.333 {'Ramirez' } {'ABC School'} 95.333 {'Lee' } {'XYZ School'} 76.667 {'Walker' } {'ABC School'} 92.333 {'Garcia' } {'ABC School'} 84.667 {'Chang' } {'XYZ School'} 79
A new variable with average test scores is added to the dataset array,
ds
.
Reformat the dataset array.
Stack the test score variables into a new variable, Scores
.
dsNew = stack(ds,{'Test1','Test2','Test3'},... 'newDataVarName','Scores')
dsNew = LastName School TestAve Scores_Indicator Scores {'Jeong' } {'XYZ School'} 90 Test1 90 {'Jeong' } {'XYZ School'} 90 Test2 87 {'Jeong' } {'XYZ School'} 90 Test3 93 {'Collins' } {'XYZ School'} 85 Test1 87 {'Collins' } {'XYZ School'} 85 Test2 85 {'Collins' } {'XYZ School'} 85 Test3 83 {'Torres' } {'XYZ School'} 86.333 Test1 86 {'Torres' } {'XYZ School'} 86.333 Test2 85 {'Torres' } {'XYZ School'} 86.333 Test3 88 {'Phillips'} {'ABC School'} 75.667 Test1 75 {'Phillips'} {'ABC School'} 75.667 Test2 80 {'Phillips'} {'ABC School'} 75.667 Test3 72 {'Ling' } {'ABC School'} 87.333 Test1 89 {'Ling' } {'ABC School'} 87.333 Test2 86 {'Ling' } {'ABC School'} 87.333 Test3 87 {'Ramirez' } {'ABC School'} 95.333 Test1 96 {'Ramirez' } {'ABC School'} 95.333 Test2 92 {'Ramirez' } {'ABC School'} 95.333 Test3 98 {'Lee' } {'XYZ School'} 76.667 Test1 78 {'Lee' } {'XYZ School'} 76.667 Test2 75 {'Lee' } {'XYZ School'} 76.667 Test3 77 {'Walker' } {'ABC School'} 92.333 Test1 91 {'Walker' } {'ABC School'} 92.333 Test2 94 {'Walker' } {'ABC School'} 92.333 Test3 92 {'Garcia' } {'ABC School'} 84.667 Test1 86 {'Garcia' } {'ABC School'} 84.667 Test2 83 {'Garcia' } {'ABC School'} 84.667 Test3 85 {'Chang' } {'XYZ School'} 79 Test1 79 {'Chang' } {'XYZ School'} 79 Test2 76 {'Chang' } {'XYZ School'} 79 Test3 82
The original test variable names, Test1
, Test2
, and
Test3
, appear as levels in the combined test scores
indicator variable, Scores_Indicator
.
Plot data grouped by category.
With the data in this format, you can use Scores_Indicator
as
a grouping variable, and draw box plots of test scores grouped by
test.
figure() boxplot(dsNew.Scores,dsNew.Scores_Indicator)
Revert the dataset array to the original format.
Reformat dsNew
back into its original format.
dsOrig = unstack(dsNew,'Scores','Scores_Indicator'); dsOrig(:,{'LastName','Test1','Test2','Test3'})
ans = LastName Test1 Test2 Test3 {'Jeong' } 90 87 93 {'Collins' } 87 85 83 {'Torres' } 86 85 88 {'Phillips'} 75 80 72 {'Ling' } 89 86 87 {'Ramirez' } 96 92 98 {'Lee' } 78 75 77 {'Walker' } 91 94 92 {'Garcia' } 86 83 85 {'Chang' } 79 76 82
The dataset array is back in wide format. unstack
reassigns
the levels of the indicator variable, Scores_Indicator
,
as variable names in the unstacked dataset array.
See Also
dataset
| double
| stack
| unstack