Main Content

Merge Dataset Arrays

This example shows how to merge dataset arrays using join.

Load sample data.

Import the data from the first worksheet in hospitalSmall.xlsx into a dataset array, then keep only a few of the variables.

ds1 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'));
ds1 = ds1(:,{'id','name','sex','age'})
ds1 = 

    id               name              sex        age
    'YPL-320'        'SMITH'           'm'        38 
    'GLI-532'        'JOHNSON'         'm'        43 
    'PNI-258'        'WILLIAMS'        'f'        38 
    'MIJ-579'        'JONES'           'f'        40 
    'XLK-030'        'BROWN'           'f'        49 
    'TFP-518'        'DAVIS'           'f'        46 
    'LPD-746'        'MILLER'          'f'        33 
    'ATA-945'        'WILSON'          'm'        40 
    'VNL-702'        'MOORE'           'm'        28 
    'LQW-768'        'TAYLOR'          'f'        31 
    'QFY-472'        'ANDERSON'        'f'        45 
    'UJG-627'        'THOMAS'          'f'        42 
    'XUE-826'        'JACKSON'         'm'        25 
    'TRW-072'        'WHITE'           'm'        39 

The dataset array, ds1, has 14 observations (rows) and 4 variables (columns).

Import the data from the worksheet Heights2 in hospitalSmall.xlsx.

ds2 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet','Heights2')
ds2 = 

    id               hgt
    'LPD-746'        61 
    'PNI-258'        62 
    'XUE-826'        71 
    'ATA-945'        72 
    'XLK-030'        63 

ds2 has height measurements for a subset of five individuals from the first dataset array, ds1.

Merge only the matching subset of observations.

Use join to merge the two dataset arrays, ds1 and ds2, keeping only the subset of observations that are in ds2.

JoinSmall = join(ds2,ds1)
JoinSmall = 

    id               hgt    name              sex        age
    'LPD-746'        61     'MILLER'          'f'        33 
    'PNI-258'        62     'WILLIAMS'        'f'        38 
    'XUE-826'        71     'JACKSON'         'm'        25 
    'ATA-945'        72     'WILSON'          'm'        40 
    'XLK-030'        63     'BROWN'           'f'        49 

In JoinSmall, the variable id only appears once. This is because it is the key variable—the variable that links observations between the two dataset arrays—and has the same variable name in both ds1 and ds2.

Include incomplete observations in the merge.

Merge ds1 and ds2 keeping all observations in the larger ds1.

joinAll = join(ds2,ds1,'type','rightouter','mergekeys',true)
joinAll = 

    id               hgt    name              sex        age
    'ATA-945'         72    'WILSON'          'm'        40 
    'GLI-532'        NaN    'JOHNSON'         'm'        43 
    'LPD-746'         61    'MILLER'          'f'        33 
    'LQW-768'        NaN    'TAYLOR'          'f'        31 
    'MIJ-579'        NaN    'JONES'           'f'        40 
    'PNI-258'         62    'WILLIAMS'        'f'        38 
    'QFY-472'        NaN    'ANDERSON'        'f'        45 
    'TFP-518'        NaN    'DAVIS'           'f'        46 
    'TRW-072'        NaN    'WHITE'           'm'        39 
    'UJG-627'        NaN    'THOMAS'          'f'        42 
    'VNL-702'        NaN    'MOORE'           'm'        28 
    'XLK-030'         63    'BROWN'           'f'        49 
    'XUE-826'         71    'JACKSON'         'm'        25 
    'YPL-320'        NaN    'SMITH'           'm'        38 
Each observation in ds1 without corresponding height measurements in ds2 has height value NaN. Also, because there is no id value in ds2 for each observation in ds1, you need to merge the keys using the option 'MergeKeys',true. This merges the key variable, id.

Merge dataset arrays with different key variable names.

When using join, it is not necessary for the key variable to have the same name in the dataset arrays to be merged. Import the data from the worksheet named Heights3 in hospitalSmall.xlsx.

ds3 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet','Heights3')
ds3 = 

    identifier       hgt
    'GLI-532'        69 
    'QFY-472'        62 
    'MIJ-579'        61 
    'VNL-702'        68 
    'XLK-030'        63 
    'LPD-746'        61 
    'TFP-518'        60 
    'YPL-320'        71 
    'ATA-945'        72 
    'LQW-768'        64 
    'PNI-258'        62 
    'UJG-627'        61 
    'XUE-826'        71 
    'TRW-072'        69 

ds3 has height measurements for each observation in ds1. This dataset array has the same patient identifiers as ds1, but they are under the variable name identifier, instead of id (and in a different order).

Specify key variable.

You can easily change the variable name of the key variable in ds3 by setting d3.Properties.VarNames or using the Variables editor, but it is not required to perform a merge. Instead, you can specify the name of the key variable in each dataset array using LeftKeys and RightKeys.

joinDiff = join(ds3,ds1,'LeftKeys','identifier','RightKeys','id')
joinDiff = 

    identifier       hgt    name              sex        age
    'GLI-532'        69     'JOHNSON'         'm'        43 
    'QFY-472'        62     'ANDERSON'        'f'        45 
    'MIJ-579'        61     'JONES'           'f'        40 
    'VNL-702'        68     'MOORE'           'm'        28 
    'XLK-030'        63     'BROWN'           'f'        49 
    'LPD-746'        61     'MILLER'          'f'        33 
    'TFP-518'        60     'DAVIS'           'f'        46 
    'YPL-320'        71     'SMITH'           'm'        38 
    'ATA-945'        72     'WILSON'          'm'        40 
    'LQW-768'        64     'TAYLOR'          'f'        31 
    'PNI-258'        62     'WILLIAMS'        'f'        38 
    'UJG-627'        61     'THOMAS'          'f'        42 
    'XUE-826'        71     'JACKSON'         'm'        25 
    'TRW-072'        69     'WHITE'           'm'        39 

The merged dataset array, joinDiff, has the same key variable order and name as the first dataset array input to join, ds3.

See Also


Related Examples

More About