Exploring x/y data with gramm

Table of Contents

Continuous data Statistical layers for continuous data stat_summary() stat_smooth() stat_glm() stat_fit() Clustered data

In this example file, we will go further in exploring gramm's capabilities for data where some independent variables are continuous data.

To benefit from interactive elements, you should open it in MATLAB's editor with

open XY.mlx

We will load a partial dataset from a human movement science experiment

websave('example_movement','https://github.com/piermorel/gramm/raw/master/sample_data/example_movement.mat'); %Download data from repository
load example_movement.mat
T
T = 3170×13 table 
 subjectsessiontrial_indexreference_directionhitm_reaction_timetarget_posvalid_percvalid_perc_sessionpxpyttperc
123
1IHTA121050616.7897-20.705537.274100.16340.16341×362 double1×362 double1×362 double1×362 double
2IHTA13601404.25874029.282000.32680.32681×317 double1×317 double1×317 double1×317 double
3IHTA143300341.692469.2820-8000.49020.49021×362 double1×362 double1×362 double1×362 double
4IHTA172401303.2130-40-109.282000.81700.81701×226 double1×226 double1×226 double1×226 double
5IHTA19151283.267477.2741-19.294501.14381.14381×349 double1×349 double1×349 double1×349 double
6IHTA1141500306.7775-69.2820-001.79741.79741×362 double1×362 double1×362 double1×362 double
7IHTA116601294.84694029.282001.96081.96081×129 double1×129 double1×129 double1×129 double
8IHTA1182401320.3605-40-109.282002.28762.28761×159 double1×159 double1×159 double1×159 double
9IHTA1191500367.6910-69.2820-002.45102.45101×362 double1×362 double1×362 double1×362 double
10IHTA1271951309.4387-77.2741-60.705502.94122.94121×283 double1×283 double1×283 double1×283 double
11IHTA129151332.937877.2741-19.294503.10463.10461×167 double1×167 double1×167 double1×167 double
12IHTA1343300369.477569.2820-8003.43143.43141×295 double1×295 double1×295 double1×295 double
13IHTA1371050337.2823-20.705537.274103.75823.75821×362 double1×362 double1×362 double1×362 double
14IHTA1401950423.8782-77.2741-60.705503.92163.92161×362 double1×362 double1×362 double1×362 double
⋮

	subject	session	trial_index	reference_direction	hit	m_reaction_time	target_pos	valid_perc	valid_perc_session	px	py	t	tperc
1	IHTA	1	2	105	0	616.7897	-20.7055	37.2741	0	0.1634	0.1634	1×362 double	1×362 double	1×362 double	1×362 double
2	IHTA	1	3	60	1	404.2587	40	29.2820	0	0.3268	0.3268	1×317 double	1×317 double	1×317 double	1×317 double
3	IHTA	1	4	330	0	341.6924	69.2820	-80	0	0.4902	0.4902	1×362 double	1×362 double	1×362 double	1×362 double
4	IHTA	1	7	240	1	303.2130	-40	-109.2820	0	0.8170	0.8170	1×226 double	1×226 double	1×226 double	1×226 double
5	IHTA	1	9	15	1	283.2674	77.2741	-19.2945	0	1.1438	1.1438	1×349 double	1×349 double	1×349 double	1×349 double
6	IHTA	1	14	150	0	306.7775	-69.2820	-0	0	1.7974	1.7974	1×362 double	1×362 double	1×362 double	1×362 double
7	IHTA	1	16	60	1	294.8469	40	29.2820	0	1.9608	1.9608	1×129 double	1×129 double	1×129 double	1×129 double
8	IHTA	1	18	240	1	320.3605	-40	-109.2820	0	2.2876	2.2876	1×159 double	1×159 double	1×159 double	1×159 double
9	IHTA	1	19	150	0	367.6910	-69.2820	-0	0	2.4510	2.4510	1×362 double	1×362 double	1×362 double	1×362 double
10	IHTA	1	27	195	1	309.4387	-77.2741	-60.7055	0	2.9412	2.9412	1×283 double	1×283 double	1×283 double	1×283 double
11	IHTA	1	29	15	1	332.9378	77.2741	-19.2945	0	3.1046	3.1046	1×167 double	1×167 double	1×167 double	1×167 double
12	IHTA	1	34	330	0	369.4775	69.2820	-80	0	3.4314	3.4314	1×295 double	1×295 double	1×295 double	1×295 double
13	IHTA	1	37	105	0	337.2823	-20.7055	37.2741	0	3.7582	3.7582	1×362 double	1×362 double	1×362 double	1×362 double
14	IHTA	1	40	195	0	423.8782	-77.2741	-60.7055	0	3.9216	3.9216	1×362 double	1×362 double	1×362 double	1×362 double
⋮

In this dataset, we have four different subjects (subject), each coming for two sessions (session) on consecutive days at the lab. During each of these sessions they learn to control the displacement of a cursor on a screen, and their task is to reach targets with the cursor. The targets are arranged at discrete angles (reference_direction) in a circle around a starting point. The cursor is difficult to control, and as a marker for progress in the task, we record whether they reach the target in time (hit) and how long was their reaction time (m_reaction_time). Each line corresponds to a trial (trial_index), and we transformed the index in percentage of trials performed within session (valid_perc, goes from 0 to 100% in each session) or across sessions (valid_perc_session, goes from 0 to 200% across both sessions).

Continuous data

Here we will represent the evolution of the reaction time across trials for each subject (using facet_grid()). To treat sessions independently we will map them to color.

figure('Position',[100 100 800 400])

g=gramm('x',T.valid_perc_session,'y',T.m_reaction_time,'color',T.session);

g.facet_grid([],T.subject);

g.geom_point('alpha',0.5);

g.set_names('x','Task progression (%)','y','Reaction time (ms)','color','Session','column','Subject');

g.draw();

The points indicate a tendency for a decrease of the reaction time in all subjects, that tends to taper off in the second session. How can we add statistical layers that will help us visualize this ?

Statistical layers for continuous data

There are several stat_ layers that allow this, they can be selected with the first drop-down menu.

stat_summary()

The most basic approach is simply to bin data across the x-axis and represent descriptive statistics like the mean reaction time and its confidence interval in each bin. This can be done with stat_summary() using the 'bin_in' argument which value sets the total number of bins across the x axis.

stat_smooth()

This uses a smoothing approach, either with a built-in fast smoothing or using underlying matlab tools from the curve fitting toolbox (splines, moving averages, loess, etc.). The 'lambda' parameter sets the smoothing intensity. The confidence interval is computed by bootstrap

stat_glm()

This uses a generalized linear model (fitglm() from the statistics toolbox), which by default is configured as a classical linear regression. Here we use an inverse gaussian distribution which is adapted to reaction time data.

The 'distribution' parameter allows to pick the distribution of the response variable and corresponding link function.
When 'disp_fit' is set to true, the model equation is displayed in the figure as well as significance stars for each term
'fullrange' determines wheter the fit is displayed across the complete range of the x axis or only the data range.

stat_fit()

This uses an anonymous function to fit any non-linear model. Here we use an exponential function provided with the 'fun' argument. This function requires starting values for the fit parameters with 'StartPoint'.

'intopt' sets whether the shaded area represents the uncertainty of data or of the fitted function.
When 'disp_fit' is set to true, the model equation is displayed in the figure

By default this method uses fit() from the curve fitting toolbox but can be configured to use fitnlm() from the statistics toolbox.

Pick the stat_ layer :

vis = "stat_glm";

All these layers use a common 'geom' argument to specify how the results are displayed. Select one with this drop-down menu:

geom = 'lines';

figure('Position',[100 100 800 400])

g=gramm('x',T.valid_perc_session,'y',T.m_reaction_time,'color',T.session);

g.facet_grid([],T.subject);

g.geom_point('alpha',0.1);

switch vis %Call the correct gramm method depending on the chosen visualization

case "stat_summary"

g.stat_summary('geom',geom,'bin_in',10);

case "stat_smooth"

g.stat_smooth('geom',geom,'lambda',1e5);

case "stat_glm"

g.stat_glm('geom',geom,'distribution','inverse gaussian','disp_fit',false,'fullrange',false);

case "stat_fit"

g.stat_fit('geom',geom,'fun',@(a,b,c,x)a.*exp(x.*b)+c,'StartPoint',[1000 -0.01 150],'intopt','functional','disp_fit',false);

end

g.set_names('x','Task progression (%)','y','Reaction time (ms)','color','Session','column','Subject');

g.draw();

%Export

g.export('file_name','xy_export','file_type','png');

Clustered data

In the previous section, we explored data where an underlying y=f(x) is assumed. gramm also proposes tools to explore x/y (and even z) data that corresponds to clusters or groupings.

First we will compute the midpoints of all cursor trajectories (see TimeSeries.mlx)

T.xmid = cellfun(@(x)x(floor(length(x)/2)),T.px);
T.ymid = cellfun(@(x)x(floor(length(x)/2)),T.py);

We will examine the evolution of those midpoints between the first and second experimental session for all subjects. Use the picker below to chose a visualization

geom_point() to view raw datapoints
stat_ellipse() to plot the center of each group and uncertainty ellipse (computed here as a 95% confidence interval on the center, with the hypothesis of a bivariate normal distribution)
stat_bin2d() to plot a 2D histogram

To note:

With geom_point() we used set_point_options() to use smaller points due to the large number of points
Here we use the 'lightness' aesthetic to display session and pick a different default colormap with set_color_options(). Not all colormaps support lightness so only two are pickable.

vis = "stat_ellipse";

figure

g=gramm('x',T.xmid,'y',T.ymid,'group',T.reference_direction,'lightness',T.session,'color',T.reference_direction);

switch vis

case "geom_point"

g.geom_point();

g.set_point_options('base_size',3);

case "stat_ellipse"

g.stat_ellipse('type','ci');

case "stat_bin2d"

g=gramm('x',T.xmid,'y',T.ymid);

g.facet_grid([],T.session);

g.stat_bin2d('geom','image','edges',{[-100:10:100],[-140:10:60]});

end

g.axe_property('DataAspectRatio',[1 1 1],'XLim',[-100 100],'YLim',[-140 60]);

g.set_color_options('map','d3_20');

g.set_names('x','X traj. midpoint (mm)','y','Y traj. midpoint (mm)','color','Direction (°)','column','Session','lightness','Session');

g.draw();

This figure demonstrates that on the second day, the subjects are less variable in their movements and go further along the target direction at the midpoint of the movement.