Create Geographic Bubble Chart from Tabular Data

Geographic bubble charts are a way to visualize data overlaid on a map. For data with geographic characteristics, these charts can provide much-needed context. In this example, you import a file into MATLAB® as a table and create a geographic bubble chart from the table variables (columns). Then you work with the data in the table to visualize aspects of the data, such as population size.

Import File as Table

Load the sample file counties.xlsx, which contains records of population and Lyme disease occurrences by county in New England. Read the data into a table using readtable.

counties = readtable('counties.xlsx');

Create Basic Geographic Bubble Chart

Create a geographic bubble chart that shows the locations of counties in New England. Specify the table as the first argument, counties. The geographic bubble chart stores the table in its SourceTable property. The example displays the first five rows of the table. Use the 'Latitude' and 'Longitude' columns of the table to specify locations. The chart automatically sets the latitude and longitude limits of the underlying map, called the basemap, to include only those areas represented by the data. Assign the GeographicBubbleChart object to the variable gb. Use gb to modify the chart after it is created.

figure
gb = geobubble(counties,'Latitude','Longitude');

head(gb.SourceTable, 5)
ans=5×19 table
    FIPS     ANSICODE     Latitude    Longitude         CountyName          State        StateName       Population2010    HousingUnits2010     LandArea     WaterArea     Cases2010    Cases2011    Cases2012    Cases2013    Cases2014    Cases2015    Cases2014_1    Cases2015_1
    ____    __________    ________    _________    _____________________    ______    _______________    ______________    ________________    __________    __________    _________    _________    _________    _________    _________    _________    ___________    ___________

    9001    2.1279e+05     41.228      -73.367     {'Fairfield County' }    {'CT'}    {'Connecticut'}      9.1683e+05         3.6122e+05       1.6185e+09    5.4916e+08       331          305          225          443          437          427           437            427    
    9003    2.1234e+05     41.806      -72.733     {'Hartford County'  }    {'CT'}    {'Connecticut'}      8.9401e+05         3.7425e+05       1.9039e+09    4.0213e+07       187          167          143          288          291          335           291            335    
    9005     2.128e+05     41.792      -73.235     {'Litchfield County'}    {'CT'}    {'Connecticut'}      1.8993e+05              87550       2.3842e+09    6.2166e+07        88          118           67          187          168          202           168            202    
    9007     2.128e+05     41.435      -72.524     {'Middlesex County' }    {'CT'}    {'Connecticut'}      1.6568e+05              74837       9.5649e+08    1.8068e+08       125          109           93          181          155          241           155            241    
    9009     2.128e+05      41.35        -72.9     {'New Haven County' }    {'CT'}    {'Connecticut'}      8.6248e+05           3.62e+05       1.5657e+09    6.6705e+08       240          249          213          388          459          474           459            474    

You can pan and zoom in and out on the basemap displayed by the geobubble function.

Visualize County Populations on the Chart

Use bubble size (diameter) to indicate the relative populations of the different counties. Specify the Population2010 variable in the table as the value of the SizeVariable parameter. In the resultant geographic bubble chart, the bubbles have different sizes to indicate population. The chart includes a legend that describes how diameter expresses size. Adjust the limits of the chart using geolimits.

gb = geobubble(counties,'Latitude','Longitude',...
                        'SizeVariable','Population2010');
geolimits([39.50 47.17],[-74.94 -65.40])

geobubble scales the bubble diameters linearly between the values specified by the SizeLimits property.

Visualize Lyme Disease Cases by County

Use bubble color to show the number of Lyme disease cases in a county for a given year. To display this type of data, the geobubble function requires that the data be a categorical value. Initially, none of the columns in the table are categorical but you can create one. For example, you can use the discretize function to create a categorical variable from the data in the Cases2010 variable. The new variable, named Severity, groups the data into three categories: Low, Medium, and High. Use this new variable as the ColorVariable parameter. These changes modify the table stored in the SourceTable property, which is a copy of the original table in the workspace, counties. Making changes to the table stored in the GeographicBubbleChart object avoids affecting the original data.

gb.SourceTable.Severity = discretize(counties.Cases2010,[0 50 100 500],...
                                 'categorical', {'Low', 'Medium', 'High'});
gb.ColorVariable = 'Severity';

Handle Undefined Data

When you plot the severity information, a fourth category appears in the color legend: undefined. This category can appear when the data you cast to categorical contains empty values or values that are out of scope for the categories you defined. Determine the cause of the undefined Severity value by hovering your cursor over an undefined bubble. The data tip shows that the bubble represents values in the 33rd row of the Lyme disease table.

Check the value of the variable used for Severity, Cases2010, which is the 12th variable in the 33rd row of the Lyme disease table.

gb.SourceTable(33,12)
ans=1×1 table
    Cases2010
    _________

       514   

The High category is defined as values between 100 and 500. However, the value of the Cases2010 variable is 514. To eliminate this undefined value, reset the upper limit of the High category to include this value. For example, use 5000.

gb.SourceTable.Severity = discretize(counties.Cases2010,[0 50 100 5000],...
                                 'categorical', {'Low', 'Medium', 'High'});

Unlike the color variable, when geobubble encounters an undefined number (NaN) in the size, latitude, or longitude variables, it ignores the value.

Choose Bubble Colors

Use a color gradient to represent the Low-Medium-High categorization. geobubble stores the colors as an m-by-3 list of RGB values in the BubbleColorList property.

gb.BubbleColorList = autumn(3);

Reorder Bubble Colors

Change the color indicating high severity to be red rather than yellow. To change the color order, you can change the ordering of either the categories or the colors listed in the BubbleColorList property. For example, initially the categories are ordered Low-Medium-High. Use the reordercats function to change the categories to High-Medium-Low. The categories change in the color legend.

neworder = {'High','Medium','Low'};
gb.SourceTable.Severity = reordercats(gb.SourceTable.Severity,neworder);

Adding Titles

When you display a geographic bubble chart with size and color variables, the chart displays a size legend and color legend to indicate what the relative sizes and colors mean. When you specify a table as an argument, geobubble automatically uses the table variable names as legend titles, but you can specify other titles using properties.

title 'Lyme Disease in New England, 2010'
gb.SizeLegendTitle = 'County Population';
gb.ColorLegendTitle = 'Lyme Disease Severity';

Refine Chart Data

Looking at the Lyme disease data, the trend appears to be that more cases occur in more densely populated areas. Looking at locations with the most cases per capita might be more interesting. Calculate the cases per 1000 people and display it on the chart.

gb.SourceTable.CasesPer1000 = gb.SourceTable.Cases2010 ./ gb.SourceTable.Population2010 * 1000;
gb.SizeVariable = 'CasesPer1000';
gb.SizeLegendTitle = 'Cases Per 1000';

The bubble sizes now tell a different story than before. The areas with the largest populations tracked relatively well with the different severity levels. However, when looking at the number of cases normalized by population, it appears that the highest risk per capita has a different geographic distribution.

See Also

| | | | | |

Related Topics