Getting started
This live script will demonstrate the basic use of gramm. To benefit from interactive elements, you should open it in MATLAB's editor with
if you want to see more specific and advanced live scripts, use the links below :
- Grouped data – An example to further explore grouped data that includes visualizations that could accompany an ANOVA.
- XY data – An example to further explore X/Y data, with regressions/fits as well as grouped 2D data
- Time series – An example that showcases the use of gramm to explore repeated time series data
- Online Table – An example showing how to use gramm along with MATLAB's table functions with an open dataset hosted on S3
- Further examples – Many examples that demonstrate gramm's more advanced capabilities and customizations
Basics
Gramm is a MATLAB toolbox that enables the rapid creation of complex, publication-quality figures. Its design philosophy focuses on a declarative approach, where users specify the desired end result, as opposed to the traditional imperative*method involving for loops, if/else statements, etc. If you haven't come across this particular set of ideas yet, it might seem a bit strange. But you will soon come to appreciate how powerful and versatile it can be for exploring data and quickly changing how things are presented.
The MATLAB implementation of gramm is inspired by the ggplot2 library for R by Hadley Wickham. A similar library called Seaborn also exists for Python. For an introduction to the general ideas, have a look at the paper "A layered grammar of graphics (pdf)" If you are really keen, you can also check out Leland Wilkinson's original book "The Grammar of Graphics" that laid the foundation for much of this work. Getting to grips with gramm
Install
Gramm is easy to install from within MATLAB: Open the Add-ons explorer, search for "gramm" and click "Add"!
A first plot
To start out, let's just build a simple scatter plot - but the gramm way.
We start by loading the carbig sample dataset, containing data about new car models between 1970 and 1980.
websave('example_data','https://github.com/piermorel/gramm/raw/master/sample_data/example_data.mat'); %Download data from repository
cars
cars = 406×11 table
| | Model | Origin | MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model_Year | Origin_Region | Manufacturer |
|---|
| 1 | 'chevroletchevellemalibu' | 'USA' | 18 | 8 | 307 | 130 | 3504 | 12 | 70 | 'USA' | 'chevrolet' |
|---|
| 2 | 'buickskylark320' | 'USA' | 15 | 8 | 350 | 165 | 3693 | 11.5000 | 70 | 'USA' | 'buick' |
|---|
| 3 | 'plymouthsatellite' | 'USA' | 18 | 8 | 318 | 150 | 3436 | 11 | 70 | 'USA' | 'plymouth' |
|---|
| 4 | 'amcrebelsst' | 'USA' | 16 | 8 | 304 | 150 | 3433 | 12 | 70 | 'USA' | 'amc' |
|---|
| 5 | 'fordtorino' | 'USA' | 17 | 8 | 302 | 140 | 3449 | 10.5000 | 70 | 'USA' | 'ford' |
|---|
| 6 | 'fordgalaxie500' | 'USA' | 15 | 8 | 429 | 198 | 4341 | 10 | 70 | 'USA' | 'ford' |
|---|
| 7 | 'chevroletimpala' | 'USA' | 14 | 8 | 454 | 220 | 4354 | 9 | 70 | 'USA' | 'chevrolet' |
|---|
| 8 | 'plymouthfuryiii' | 'USA' | 14 | 8 | 440 | 215 | 4312 | 8.5000 | 70 | 'USA' | 'plymouth' |
|---|
| 9 | 'pontiaccatalina' | 'USA' | 14 | 8 | 455 | 225 | 4425 | 10 | 70 | 'USA' | 'pontiac' |
|---|
| 10 | 'amcambassadordpl' | 'USA' | 15 | 8 | 390 | 190 | 3850 | 8.5000 | 70 | 'USA' | 'amc' |
|---|
| 11 | 'citroends-21pallas' | 'France' | NaN | 4 | 133 | 115 | 3090 | 17.5000 | 70 | 'Europe' | 'citroen' |
|---|
| 12 | 'chevroletchevelleconcours(sw)' | 'USA' | NaN | 8 | 350 | 165 | 4142 | 11.5000 | 70 | 'USA' | 'chevrolet' |
|---|
| 13 | 'fordtorino(sw)' | 'USA' | NaN | 8 | 351 | 153 | 4034 | 11 | 70 | 'USA' | 'ford' |
|---|
| 14 | 'plymouthsatellite(sw)' | 'USA' | NaN | 8 | 383 | 175 | 4166 | 10.5000 | 70 | 'USA' | 'plymouth' |
|---|
| ⋮ |
|---|
The data are organised in a way that's common: we have a table where columns are attributes of the cars (region of origin, cylinders, fuel economy, horsepower...) and each line is a new car model.
One of the main ideas of the grammar of graphics is that every plot or visualisation is made up of many different elements that can be combined. The basic elements in this scheme are (1) data, (2) aesthetics, (3) geometry and (4) statistics. The data refer the numbers/information you want to display. The aesthetics are the properties of the graph that you want to map these data to: this could be the x/y position of a point or line segment, the colour, line thickness, and so on. The geometry indicates whether you want to use points, lines, etc. And finally, statistics corresponds to statistical visualizations/summaries such as error bars, fits, etc.
So let's apply these ideas here:
a. We create a gramm object and decide on how we want to map data to aeshetics. gramm is designed in an object-oriented way, so the next few lines might look odd - but the pattern will become familiar very quickly:
g=gramm('x',cars.Model_Year,'y',cars.MPG);
This means: create a gramm object which will be assiged to the variable g. The cars.Model_Year column of data will be mapped to the x-axis, the fuel economy in miles per gallon (cars.MPG) to the y-axis.
When you execute this line of code, nothing appears to happen, but if you inspect the workspace, the variable `g` has been created.
b. Now we add the geometry we would like to display by calling the method geom_point() on the object g since we want to create a scatter plot, which consists of...points.
A note here: g.geom_point() is equivalent to geom_point(g), but the dot notation makes it much more explicit that geom_point() is called on the object g. It's also more readable.
c. As a final step, we actually need to also explicity call for the figure to be created and drawn. For this we call draw() on the object g. This may seem like a drag, but actually it's very useful because it will allow us to stack up calls and make more complicated figures later...
The plot we just produced shows that fuel economy increases with year of production but is not that informative since all vehicles are mixed together. Moreover we had to write three lines to get a figure that MATLAB could do in one line using plot()! Now we want to display more information in our figure, and this is where the grammar of graphics approach will shine.
Adding layers of complexity
Adding a color aesthetic
Let's say that we want to compare the progression of fuel economy depending on the number of cylinders in the cars. We just need to map the data cars.Cylinders to the color aesthetic to get points colored depending of the number of cylinders of the car.
g=gramm('x',cars.Model_Year,'y',cars.MPG,'color',cars.Cylinders); %Added a color aesthetic
Notice how gramm automatically added the color legend on the right of the plot. This plot is so far equivalent to what could be generated with MATLAB's gscatter.
Adding statistical visualizations
The figure seems to confirm that cars with less cylinders are ahead in terms of fuel economy, but the discrete year data and superimposed points make it difficult to see those tendencies. We will solve that with two changes :
- We will overlay linear regressions lines in the plot with the stats_glm() method. This is a statistics layer
- We will tweak the display of points so that different cylinder counts are slightly shifted along the x-axis
g=gramm('x',cars.Model_Year,'y',cars.MPG,'color',cars.Cylinders);
g.stat_glm(); %Add linear regression lines
g.geom_point("dodge",0.4); %shift groups along the x-axis with the "dodge" argument
With this visualization we see that cars with three and five cylinders are quite rare and clutter our display, let's remove them.
Select a subset of data to plot
With gramm it's easy to plot only a subset of the data without having to apply a selection to each variable. This is done with the 'subset' option when creating the gramm objet, which receives a logical array similar to what one would use for logical indexing.
g=gramm('x',cars.Model_Year,'y',cars.MPG,'color',cars.Cylinders, ...
'subset',cars.Cylinders~=3 & cars.Cylinders~=5); %Select cars that are not 3 or 5 cylinders;
g.geom_point("dodge",0.5);
Now we are getting somewhere, and the figure is already quite complex with a minimal amount of code. In terms of our exploration of this dataset however, one could argue that this representation might be distorted because different countries favor different amount of cylinders and could have different fuel economy objectives. Let's try to take this in account.
Using a variable to create subplots
We could map the region of origin to another aesthetic like marker shape or point size, but these are not always easy to read. Here it's probably best to create subplots for each region. This is best done in gramm by calling the facet_grid() method which receives as arguments the variables to map to subplot rows and to subplot columns.
figure('Position',[100 100 800 400]); %We also make the figure a bit wider
g=gramm('x',cars.Model_Year,'y',cars.MPG,'color',cars.Cylinders, ...
'subset',cars.Cylinders~=3 & cars.Cylinders~=5);
%Here we want region of origin to determine subplot columns so the first argument (rows) is left empty
g.facet_grid([],cars.Origin_Region);
g.geom_point("dodge",0.4);
Notice how all subplots share the same scale which allows easy comparison between subplots.
Final plot : adding a title, fixing legends and exporting
The only thing missing from our figure are proper legends and a title (notice that for now the axes are called x and y, cylinders Color and regions Column). So as a final tweak we provide gramm with the necessary information. We will also use the export() method to create a high-resolution file.
figure('Position',[100 100 800 400]);
g=gramm('x',cars.Model_Year,'y',cars.MPG,'color',cars.Cylinders,...
'subset',cars.Cylinders~=3 & cars.Cylinders~=5);
g.facet_grid([],cars.Origin_Region);
g.geom_point("dodge",0.5);
g.set_names('column','Origin', 'x','Year of production', ... %With set_names() we simply indicate how each aesthetic should be called
'y','Fuel economy (MPG)','color','# Cylinders');
g.set_title('Fuel economy of new cars between 1970 and 1982'); % With set_title() we provide a big title to the figure
g.export('file_name','gettingstarted_export','file_type','png');
To conclude
With gramm we have created a complex figure with only seven lines of code, each of those lines being quite explicit and easy to understand. We also see that changing how the figure is organized would be very easy. We could for example swap subplot columns for cylinders and colors for region of origin to better see the worse fuel economy for US cars for similar cylinder counts. Comparatively, creating such a plot with base MATLAB plotting functions would likely require two for loops, three times as many lines of code and would require much rewriting to implement changes.
With this example we only scratched the surface of what is possible with gramm, as there are many more geometries and statistical visualizations available, as well as other types of data that can be plotted (categorical data, time series, 3D data). Moreover gramm offers many ways to tweak its graphics with built-in options as well as through direct access to the graphical elements a posteriori. These possibilities are detailed in other live scripts and in the examples file.