Exploring grouped data with gramm

Table of Contents
In this example file, we will go further in exploring gramm's capabilities for data where the independent variables are categorical / group data.
To benefit from interactive elements, you should open it in MATLAB's editor with
open Groups.mlx
We will load a partial dataset from a human movement science experiment
websave('example_movement','https://github.com/piermorel/gramm/raw/master/sample_data/example_movement.mat'); %Download data from repository
load example_movement.mat
T
T = 3170×15 table
 subjectsessiontrial_indexreference_directionhitm_movement_durationm_distm_reaction_timetarget_posvalid_percvalid_perc_sessionpxpyttperc
123
1IHTA1210503.0020e+03201.1152616.7897-20.705537.274100.16340.16341×362 double1×362 double1×362 double1×362 double
2IHTA136012.6261e+03487.0262404.25874029.282000.32680.32681×317 double1×317 double1×317 double1×317 double
3IHTA1433003.0016e+03483.2827341.692469.2820-8000.49020.49021×362 double1×362 double1×362 double1×362 double
4IHTA1724011.8670e+03323.5568303.2130-40-109.282000.81700.81701×226 double1×226 double1×226 double1×226 double
5IHTA191512.8925e+03638.7513283.267477.2741-19.294501.14381.14381×349 double1×349 double1×349 double1×349 double
6IHTA11415003.0024e+03632.5109306.7775-69.2820-001.79741.79741×362 double1×362 double1×362 double1×362 double
7IHTA1166011.0567e+0396.0046294.84694029.282001.96081.96081×129 double1×129 double1×129 double1×129 double
8IHTA11824011.3083e+03293.7532320.3605-40-109.282002.28762.28761×159 double1×159 double1×159 double1×159 double
9IHTA11915003.0020e+03556.4195367.6910-69.2820-002.45102.45101×362 double1×362 double1×362 double1×362 double
10IHTA12719512.3428e+03468.3147309.4387-77.2741-60.705502.94122.94121×283 double1×283 double1×283 double1×283 double
11IHTA1291511.3741e+03236.0695332.937877.2741-19.294503.10463.10461×167 double1×167 double1×167 double1×167 double
12IHTA13433002.4416e+03346.7274369.477569.2820-8003.43143.43141×295 double1×295 double1×295 double1×295 double
13IHTA13710503.0028e+03893.2468337.2823-20.705537.274103.75823.75821×362 double1×362 double1×362 double1×362 double
14IHTA14019503.0028e+03782.8285423.8782-77.2741-60.705503.92163.92161×362 double1×362 double1×362 double1×362 double
In this dataset, we have four different subjects (subject), each coming for two sessions (session) on consecutive days at the lab. During each of these sessions they learn to control the displacement of a cursor on a screen, and their task is to reach targets with the cursor. The targets are arranged at discrete angles (reference_direction) in a circle around a starting point. The cursor is difficult to control, and as a marker for progress in the task, we record whether they reach the target in time (hit) and how long was their reaction time (m_reaction_time). Each line corresponds to a trial (trial_index), and we transformed the index in percentage of trials performed within session (valid_perc, goes from 0 to 100% in each session) or across sessions (valid_perc_session, goes from 0 to 200% across both sessions).

Using categorical data on the x axis

First let's examine the progress (using the reaction time m_reaction_time) between session for each subject. With gramm it's possible to use categorical data on the x axis and thus reproduce typical raw data plots or statistical data plots that would accompany analyses such as ANOVAs.
Interactive parameter: To avoid points from both sessions to overlap, we use the 'dodge' parameter in geom_point(). The numerical value indicates the spacing along the x axis used to avoid the overlap.
figure
g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);
g.geom_point('dodge',0.3);
g.set_names('x','Subject','y','Reaction time (ms)','color','Session');
g.draw();
Here we see that even with using the 'dodge' argument, the basic geom_point() is limited becaus of the overlap between points. We have two other geom_ methods that can make this better.

Improving the visualization of raw datapoints

Randomly jitter points with geom_jitter() and dodge graphical elements

A first option is to use geom_jitter() instead of geom_point() so that the datapoins are jittered along the x-axis.
Interactive parameters:
figure('Position',[100 100 800 500])
g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);
g.geom_jitter('dodge',0.6,'width',0.5,'alpha',0.3);
g.set_names('x','Subject','y','Reaction time (ms)','color','Session');
g.draw();
Here we better visualize the underlying distribution, however it's still difficult to get an idea with the mass of points in the middle of the distribution.

Create a beeswarm plot with geom_swarm()

Another option to display raw datapoints is to to use a swarm plot, which stacks datapoints horizontally
Interactive parameters:
figure('Position',[100 100 800 500])
g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);
g.geom_swarm('alpha',0.5,'point_size',1.5,'type','up','corral','none');
g.set_names('x','Subject','y','Reaction time (ms)','color','Session');
g.draw();

Adding statistics layers

Now that our we displayed our raw dataset, we can add more statistics oriented visualizations to our graphs. Note that all stat_ layers could be combined with geom_ layers or together.

Compare distributions vertically or horizontally

gramm provides two common statistical visualizations for comparing the distributions of grouped data: box and whisker plots or violin plots. You can pick one below with the first dropdown menu.
Interactive parameters:
vis = "stat_boxplot";
flip = false;
 
 
figure
g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);
switch vis
case "stat_boxplot"
g.stat_boxplot('notch',false);
case "stat_violin"
g.stat_violin('width',0.5,'half',false,'fill',"transparent",'normalization','area');
end
if flip
g.coord_flip();
end
g.set_names('x','Subject','y','Reaction time (ms)','color','Session');
g.draw();

Summarize data

The stat_summary() layer can represent different descriptive statistics with various types of graphical elements : bars, points, errorbars, lines, shaded areas, etc. This layer is closest to the output of a statistical test such as an ANOVA or t-test. By default it represents the mean and 95% confidence interval of the mean for the group. Note that the 95% confidence interval is computed independently for each group as gramm can't know your experimental design (no multiple comparison correction).
Interactive parameters:
 
figure
g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);
g.stat_summary('geom',{'bar','black_errorbar'},'setylim',true,'type','ci');
g.set_names('x','Subject','y','Reaction time (ms)','color','Session');
g.draw();
Overall this figure confirms the large between-subject variability in the reaction time and shows that all subjects have a lower reaction time on the second day.

Advanced example

In this last figure, we will overlay on top of the swarm plot the group median using stat_summary(). After the drawing is done, we will access the handles of the graphical elements through the results structure within the gramm object to make the medians more visible
mods = true;
 
figure('Position',[100 100 800 500])
g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);
g.geom_swarm('alpha',0.5,'point_size',1.5);
g.stat_summary('dodge',0.7,'geom','black_point','type','quartile');
g.set_names('x','Subject','y','Reaction time (ms)','color','Session');
g.draw();
 
% Most of the data and graphic handles created by layers can be accessed
% through the results structure
if mods
set([g.results.stat_summary.point_handle],'Marker','s');
set([g.results.stat_summary.point_handle],'MarkerSize',10);
end
%Export
ans = '/private/var/folders/7y/fc4pvx655qg6k9bmc55sd1dh0000gn/T/Editor_xmofo/LiveEditorEvaluationHelperE1779447240'
g.export('file_name','groups_export','file_type','png');