Categorize Numeric Data
Note
The nominal
and ordinal
array data types are not recommended. To represent ordered and unordered discrete, nonnumeric
data, use the Categorical Arrays data type instead.
Categorize Numeric Data
This example shows how to categorize numeric data into a categorical ordinal array using ordinal
. This is useful for discretizing continuous data.
Load sample data.
The dataset array, hospital
, contains variables measured on a sample of patients. Compute the minimum, median, and maximum of the variable Age
.
load hospital
quantile(hospital.Age,[0,.5,1])
ans = 1×3
25 39 50
The patient ages range from 25 to 50.
Convert a numeric array to an ordinal array.
Group patients into the age categories Under 30
, 30-39
, Over 40
.
hospital.AgeCat = ordinal(hospital.Age,{'Under 30','30-39','Over 40'},... [],[25,30,40,50]); getlevels(hospital.AgeCat)
ans = 1x3 ordinal
Under 30 30-39 Over 40
The last input argument to ordinal
has the endpoints for the categories. The first category begins at age 25, the second at age 30, and so on. The last category contains ages 40 and above, so begins at 40 and ends at 50 (the maximum age in the data set). To specify three categories, you must specify four endpoints (the last endpoint is the upper bound of the last category).
Explore categories.
Display the age and age category for the second patient.
dataset({hospital.Age(2),'Age'},... {hospital.AgeCat(2),'AgeCategory'})
ans = Age AgeCategory 43 Over 40
When you discretize a numeric array into categories, the categorical array loses all information about the actual numeric values. In this example, AgeCat
is not numeric, and you cannot recover the raw data values from it.
Categorize a numeric array into quartiles.
The variable Weight
has weight measurements for the sample patients. Categorize the patient weights into four categories, by quartile.
p = 0:.25:1; breaks = quantile(hospital.Weight,p); hospital.WeightQ = ordinal(hospital.Weight,{'Q1','Q2','Q3','Q4'},... [],breaks); getlevels(hospital.WeightQ)
ans = 1x4 ordinal
Q1 Q2 Q3 Q4
Explore categories.
Display the weight and weight quartile for the second patient.
dataset({hospital.Weight(2),'Weight'},... {hospital.WeightQ(2),'WeightQuartile'})
ans = Weight WeightQuartile 163 Q3
Summary statistics grouped by category levels.
Compute the mean systolic and diastolic blood pressure for each age and weight category.
grpstats(hospital,{'AgeCat','WeightQ'},'mean','DataVars','BloodPressure')
ans = AgeCat WeightQ GroupCount mean_BloodPressure Under 30_Q1 Under 30 Q1 6 123.17 79.667 Under 30_Q2 Under 30 Q2 3 120.33 79.667 Under 30_Q3 Under 30 Q3 2 127.5 86.5 Under 30_Q4 Under 30 Q4 4 122 78 30-39_Q1 30-39 Q1 12 121.75 81.75 30-39_Q2 30-39 Q2 9 119.56 82.556 30-39_Q3 30-39 Q3 9 121 83.222 30-39_Q4 30-39 Q4 11 125.55 87.273 Over 40_Q1 Over 40 Q1 7 122.14 84.714 Over 40_Q2 Over 40 Q2 13 123.38 79.385 Over 40_Q3 Over 40 Q3 14 123.07 84.643 Over 40_Q4 Over 40 Q4 10 124.6 85.1
The variable BloodPressure
is a matrix with two columns. The first column is systolic blood pressure, and the second column is diastolic blood pressure. The group in the sample with the highest mean diastolic blood pressure, 87.273
, is aged 30–39 and in the highest weight quartile, 30-39_Q4
.
See Also
Related Examples
- Create Nominal and Ordinal Arrays
- Merge Category Levels
- Plot Data Grouped by Category
- Index and Search Using Nominal and Ordinal Arrays