Exploring grouped data with gramm

Table of Contents

Using categorical data on the x axis Improving the visualization of raw datapoints Randomly jitter points with geom_jitter() and dodge graphical elements Create a beeswarm plot with geom_swarm() Adding statistics layers Compare distributions vertically or horizontally Summarize data Advanced example

In this example file, we will go further in exploring gramm's capabilities for data where the independent variables are categorical / group data.

To benefit from interactive elements, you should open it in MATLAB's editor with

open Groups.mlx

We will load a partial dataset from a human movement science experiment

websave('example_movement','https://github.com/piermorel/gramm/raw/master/sample_data/example_movement.mat'); %Download data from repository
load example_movement.mat
T
T = 3170×15 table 
 subjectsessiontrial_indexreference_directionhitm_movement_durationm_distm_reaction_timetarget_posvalid_percvalid_perc_sessionpxpyttperc
123
1IHTA1210503.0020e+03201.1152616.7897-20.705537.274100.16340.16341×362 double1×362 double1×362 double1×362 double
2IHTA136012.6261e+03487.0262404.25874029.282000.32680.32681×317 double1×317 double1×317 double1×317 double
3IHTA1433003.0016e+03483.2827341.692469.2820-8000.49020.49021×362 double1×362 double1×362 double1×362 double
4IHTA1724011.8670e+03323.5568303.2130-40-109.282000.81700.81701×226 double1×226 double1×226 double1×226 double
5IHTA191512.8925e+03638.7513283.267477.2741-19.294501.14381.14381×349 double1×349 double1×349 double1×349 double
6IHTA11415003.0024e+03632.5109306.7775-69.2820-001.79741.79741×362 double1×362 double1×362 double1×362 double
7IHTA1166011.0567e+0396.0046294.84694029.282001.96081.96081×129 double1×129 double1×129 double1×129 double
8IHTA11824011.3083e+03293.7532320.3605-40-109.282002.28762.28761×159 double1×159 double1×159 double1×159 double
9IHTA11915003.0020e+03556.4195367.6910-69.2820-002.45102.45101×362 double1×362 double1×362 double1×362 double
10IHTA12719512.3428e+03468.3147309.4387-77.2741-60.705502.94122.94121×283 double1×283 double1×283 double1×283 double
11IHTA1291511.3741e+03236.0695332.937877.2741-19.294503.10463.10461×167 double1×167 double1×167 double1×167 double
12IHTA13433002.4416e+03346.7274369.477569.2820-8003.43143.43141×295 double1×295 double1×295 double1×295 double
13IHTA13710503.0028e+03893.2468337.2823-20.705537.274103.75823.75821×362 double1×362 double1×362 double1×362 double
14IHTA14019503.0028e+03782.8285423.8782-77.2741-60.705503.92163.92161×362 double1×362 double1×362 double1×362 double
⋮

	subject	session	trial_index	reference_direction	hit	m_movement_duration	m_dist	m_reaction_time	target_pos	valid_perc	valid_perc_session	px	py	t	tperc
1	IHTA	1	2	105	0	3.0020e+03	201.1152	616.7897	-20.7055	37.2741	0	0.1634	0.1634	1×362 double	1×362 double	1×362 double	1×362 double
2	IHTA	1	3	60	1	2.6261e+03	487.0262	404.2587	40	29.2820	0	0.3268	0.3268	1×317 double	1×317 double	1×317 double	1×317 double
3	IHTA	1	4	330	0	3.0016e+03	483.2827	341.6924	69.2820	-80	0	0.4902	0.4902	1×362 double	1×362 double	1×362 double	1×362 double
4	IHTA	1	7	240	1	1.8670e+03	323.5568	303.2130	-40	-109.2820	0	0.8170	0.8170	1×226 double	1×226 double	1×226 double	1×226 double
5	IHTA	1	9	15	1	2.8925e+03	638.7513	283.2674	77.2741	-19.2945	0	1.1438	1.1438	1×349 double	1×349 double	1×349 double	1×349 double
6	IHTA	1	14	150	0	3.0024e+03	632.5109	306.7775	-69.2820	-0	0	1.7974	1.7974	1×362 double	1×362 double	1×362 double	1×362 double
7	IHTA	1	16	60	1	1.0567e+03	96.0046	294.8469	40	29.2820	0	1.9608	1.9608	1×129 double	1×129 double	1×129 double	1×129 double
8	IHTA	1	18	240	1	1.3083e+03	293.7532	320.3605	-40	-109.2820	0	2.2876	2.2876	1×159 double	1×159 double	1×159 double	1×159 double
9	IHTA	1	19	150	0	3.0020e+03	556.4195	367.6910	-69.2820	-0	0	2.4510	2.4510	1×362 double	1×362 double	1×362 double	1×362 double
10	IHTA	1	27	195	1	2.3428e+03	468.3147	309.4387	-77.2741	-60.7055	0	2.9412	2.9412	1×283 double	1×283 double	1×283 double	1×283 double
11	IHTA	1	29	15	1	1.3741e+03	236.0695	332.9378	77.2741	-19.2945	0	3.1046	3.1046	1×167 double	1×167 double	1×167 double	1×167 double
12	IHTA	1	34	330	0	2.4416e+03	346.7274	369.4775	69.2820	-80	0	3.4314	3.4314	1×295 double	1×295 double	1×295 double	1×295 double
13	IHTA	1	37	105	0	3.0028e+03	893.2468	337.2823	-20.7055	37.2741	0	3.7582	3.7582	1×362 double	1×362 double	1×362 double	1×362 double
14	IHTA	1	40	195	0	3.0028e+03	782.8285	423.8782	-77.2741	-60.7055	0	3.9216	3.9216	1×362 double	1×362 double	1×362 double	1×362 double
⋮

In this dataset, we have four different subjects (subject), each coming for two sessions (session) on consecutive days at the lab. During each of these sessions they learn to control the displacement of a cursor on a screen, and their task is to reach targets with the cursor. The targets are arranged at discrete angles (reference_direction) in a circle around a starting point. The cursor is difficult to control, and as a marker for progress in the task, we record whether they reach the target in time (hit) and how long was their reaction time (m_reaction_time). Each line corresponds to a trial (trial_index), and we transformed the index in percentage of trials performed within session (valid_perc, goes from 0 to 100% in each session) or across sessions (valid_perc_session, goes from 0 to 200% across both sessions).

Using categorical data on the x axis

First let's examine the progress (using the reaction time m_reaction_time) between session for each subject. With gramm it's possible to use categorical data on the x axis and thus reproduce typical raw data plots or statistical data plots that would accompany analyses such as ANOVAs.

Interactive parameter: To avoid points from both sessions to overlap, we use the 'dodge' parameter in geom_point(). The numerical value indicates the spacing along the x axis used to avoid the overlap.

figure

g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);

g.geom_point('dodge',0.3);

g.set_names('x','Subject','y','Reaction time (ms)','color','Session');

g.draw();

Here we see that even with using the 'dodge' argument, the basic geom_point() is limited becaus of the overlap between points. We have two other geom_ methods that can make this better.

Improving the visualization of raw datapoints

Randomly jitter points with geom_jitter() and dodge graphical elements

A first option is to use geom_jitter() instead of geom_point() so that the datapoins are jittered along the x-axis.

Interactive parameters:

The 'width' parameter sets the width along the x axis used for the visualization (here the width of jittering). Setting it below the value used for 'dodge' allows for a small spacing between points for session 1 and 2. Setting a larger value could make the points overlap. Most geom_ and stat_ methods have these parameters set with usable default values as used later, but they often require tweaking depending on the complexity of your data and figure size.
We can also make the individual points transparent with the 'alpha' parameter

figure('Position',[100 100 800 500])

g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);

g.geom_jitter('dodge',0.6,'width',0.5,'alpha',0.3);

g.set_names('x','Subject','y','Reaction time (ms)','color','Session');

g.draw();

Here we better visualize the underlying distribution, however it's still difficult to get an idea with the mass of points in the middle of the distribution.

Create a beeswarm plot with geom_swarm()

Another option to display raw datapoints is to to use a swarm plot, which stacks datapoints horizontally

Interactive parameters:

Here we set the 'point_size' parameter for geom_swarm() given the large amount of points in our dataset. geom_swarm is designed so that points within a group never overlap, so big points would make each swarm large.
The 'type' parameter configures the way the swarm is constructed
The 'corral' parameter configures what happens to points that are placed further to the left and right than the width.

figure('Position',[100 100 800 500])

g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);

g.geom_swarm('alpha',0.5,'point_size',1.5,'type','up','corral','none');

g.set_names('x','Subject','y','Reaction time (ms)','color','Session');

g.draw();

Adding statistics layers

Now that our we displayed our raw dataset, we can add more statistics oriented visualizations to our graphs. Note that all stat_ layers could be combined with geom_ layers or together.

Compare distributions vertically or horizontally

gramm provides two common statistical visualizations for comparing the distributions of grouped data: box and whisker plots or violin plots. You can pick one below with the first dropdown menu.

Interactive parameters:

The coord_flip() button runs the corresponding method, which allows to flip x an y axes and thus to represent have horizontal visualizations.
Box and whisker plots can have a 'notch' or not
When comparing only two groups, violin plots can be set to show only half violins with 'half'
The 'fill' option allows to pick between different styles
The 'normalization' option is used to set up how each violin width is normalized in order to be able to compare groups with different sizes

vis = "stat_boxplot";

flip = false;

figure

g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);

switch vis

case "stat_boxplot"

g.stat_boxplot('notch',false);

case "stat_violin"

g.stat_violin('width',0.5,'half',false,'fill',"transparent",'normalization','area');

end

if flip

g.coord_flip();

end

g.set_names('x','Subject','y','Reaction time (ms)','color','Session');

g.draw();

Summarize data

The stat_summary() layer can represent different descriptive statistics with various types of graphical elements : bars, points, errorbars, lines, shaded areas, etc. This layer is closest to the output of a statistical test such as an ANOVA or t-test. By default it represents the mean and 95% confidence interval of the mean for the group. Note that the 95% confidence interval is computed independently for each group as gramm can't know your experimental design (no multiple comparison correction).

Interactive parameters:

The 'geom' parameter specifies how the descriptive statistics are represented. The parameter can be given as a single string or as a cell of strings to combine them together. Here you can try a combination of two
The 'setylim' parameter determines whether the Y scale depends on the summay only or encompasses the whole dataset
The 'type' parameter specifies which descriptive statistics are used and how they are computed. The defaults 95% confidence interval assumes a normal distribution, but other disributions can be picked or a bootsrapped confidence interval.
stat_summary() can be used for continuous or time series data. These uses are detailed in the corresponding live scripts.

figure

g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);

g.stat_summary('geom',{'bar','black_errorbar'},'setylim',true,'type','ci');

g.set_names('x','Subject','y','Reaction time (ms)','color','Session');

g.draw();

Overall this figure confirms the large between-subject variability in the reaction time and shows that all subjects have a lower reaction time on the second day.

Advanced example

In this last figure, we will overlay on top of the swarm plot the group median using stat_summary(). After the drawing is done, we will access the handles of the graphical elements through the results structure within the gramm object to make the medians more visible

mods = true;

figure('Position',[100 100 800 500])

g=gramm('x',T.subject,'y',T.m_reaction_time,'color',T.session);

g.geom_swarm('alpha',0.5,'point_size',1.5);

g.stat_summary('dodge',0.7,'geom','black_point','type','quartile');

g.set_names('x','Subject','y','Reaction time (ms)','color','Session');

g.draw();

% Most of the data and graphic handles created by layers can be accessed

% through the results structure

if mods

set([g.results.stat_summary.point_handle],'Marker','s');

set([g.results.stat_summary.point_handle],'MarkerSize',10);

end

%Export

ans = '/private/var/folders/7y/fc4pvx655qg6k9bmc55sd1dh0000gn/T/Editor_xmofo/LiveEditorEvaluationHelperE1779447240'

g.export('file_name','groups_export','file_type','png');