Main Content

## Calculations on Dataset Arrays

This example shows how to perform calculations on dataset arrays.

### Load sample data.

Import the data from the comma-separated text file `testScores.csv`.

`ds = dataset('File','testScores.csv','Delimiter',',')`
```ds = LastName School Test1 Test2 Test3 {'Jeong' } {'XYZ School'} 90 87 93 {'Collins' } {'XYZ School'} 87 85 83 {'Torres' } {'XYZ School'} 86 85 88 {'Phillips'} {'ABC School'} 75 80 72 {'Ling' } {'ABC School'} 89 86 87 {'Ramirez' } {'ABC School'} 96 92 98 {'Lee' } {'XYZ School'} 78 75 77 {'Walker' } {'ABC School'} 91 94 92 {'Garcia' } {'ABC School'} 86 83 85 {'Chang' } {'XYZ School'} 79 76 82 ```

There are 3 test scores for each of 10 students, in wide format.

### Average dataset array variables.

Compute the average (mean) test score for each student in the dataset array, and store it in a new variable, `TestAvg`. Test scores are in columns 3 to 5.

Use `double` to convert the specified dataset array variables into a numeric array. Then, calculate the mean across the second dimension (across columns) to get the test average for each student.

```ds.TestAvg = mean(double(ds(:,3:5)),2); ds(:,{'LastName','TestAvg'})```
```ans = LastName TestAvg {'Jeong' } 90 {'Collins' } 85 {'Torres' } 86.333 {'Phillips'} 75.667 {'Ling' } 87.333 {'Ramirez' } 95.333 {'Lee' } 76.667 {'Walker' } 92.333 {'Garcia' } 84.667 {'Chang' } 79 ```

### Summarize the dataset array using a grouping variable.

Compute the mean and maximum average test scores for each school.

`stats = grpstats(ds,'School',{'mean','max'},'DataVars','TestAvg')`
```stats = School GroupCount mean_TestAvg max_TestAvg XYZ School {'XYZ School'} 5 83.4 90 ABC School {'ABC School'} 5 87.067 95.333 ```

This returns a new dataset array containing the specified summary statistics for each level of the grouping variable, `School`.

### Replace data values.

The denominator for each test score is 100. Convert the test score denominator to 25.

```scores = double(ds(:,3:)); newScores = scores*25/100; ds = replacedata(ds,newScores,3:5)```
```ds = LastName School Test1 Test2 Test3 TestAvg {'Jeong' } {'XYZ School'} 22.5 21.75 23.25 90 {'Collins' } {'XYZ School'} 21.75 21.25 20.75 85 {'Torres' } {'XYZ School'} 21.5 21.25 22 86.333 {'Phillips'} {'ABC School'} 18.75 20 18 75.667 {'Ling' } {'ABC School'} 22.25 21.5 21.75 87.333 {'Ramirez' } {'ABC School'} 24 23 24.5 95.333 {'Lee' } {'XYZ School'} 19.5 18.75 19.25 76.667 {'Walker' } {'ABC School'} 22.75 23.5 23 92.333 {'Garcia' } {'ABC School'} 21.5 20.75 21.25 84.667 {'Chang' } {'XYZ School'} 19.75 19 20.5 79 ```

The first two lines of code extract the test data and perform the desired calculation. Then, `replacedata` inserts the new test scores back into the dataset array.

The variable of test score averages, `TestAvg`, is now the final score for each student.

### Change variable name.

Change the variable name to `Final`.

```ds.Properties.VarNames{end} = 'Final'; ds```
```ds = LastName School Test1 Test2 Test3 Final {'Jeong' } {'XYZ School'} 22.5 21.75 23.25 90 {'Collins' } {'XYZ School'} 21.75 21.25 20.75 85 {'Torres' } {'XYZ School'} 21.5 21.25 22 86.333 {'Phillips'} {'ABC School'} 18.75 20 18 75.667 {'Ling' } {'ABC School'} 22.25 21.5 21.75 87.333 {'Ramirez' } {'ABC School'} 24 23 24.5 95.333 {'Lee' } {'XYZ School'} 19.5 18.75 19.25 76.667 {'Walker' } {'ABC School'} 22.75 23.5 23 92.333 {'Garcia' } {'ABC School'} 21.5 20.75 21.25 84.667 {'Chang' } {'XYZ School'} 19.75 19 20.5 79```