Exploring x/y data with gramm

Table of Contents
In this example file, we will go further in exploring gramm's capabilities for data where some independent variables are continuous data.
To benefit from interactive elements, you should open it in MATLAB's editor with
open XY.mlx
We will load a partial dataset from a human movement science experiment
websave('example_movement','https://github.com/piermorel/gramm/raw/master/sample_data/example_movement.mat'); %Download data from repository
load example_movement.mat
T
T = 3170×13 table
 subjectsessiontrial_indexreference_directionhitm_reaction_timetarget_posvalid_percvalid_perc_sessionpxpyttperc
123
1IHTA121050616.7897-20.705537.274100.16340.16341×362 double1×362 double1×362 double1×362 double
2IHTA13601404.25874029.282000.32680.32681×317 double1×317 double1×317 double1×317 double
3IHTA143300341.692469.2820-8000.49020.49021×362 double1×362 double1×362 double1×362 double
4IHTA172401303.2130-40-109.282000.81700.81701×226 double1×226 double1×226 double1×226 double
5IHTA19151283.267477.2741-19.294501.14381.14381×349 double1×349 double1×349 double1×349 double
6IHTA1141500306.7775-69.2820-001.79741.79741×362 double1×362 double1×362 double1×362 double
7IHTA116601294.84694029.282001.96081.96081×129 double1×129 double1×129 double1×129 double
8IHTA1182401320.3605-40-109.282002.28762.28761×159 double1×159 double1×159 double1×159 double
9IHTA1191500367.6910-69.2820-002.45102.45101×362 double1×362 double1×362 double1×362 double
10IHTA1271951309.4387-77.2741-60.705502.94122.94121×283 double1×283 double1×283 double1×283 double
11IHTA129151332.937877.2741-19.294503.10463.10461×167 double1×167 double1×167 double1×167 double
12IHTA1343300369.477569.2820-8003.43143.43141×295 double1×295 double1×295 double1×295 double
13IHTA1371050337.2823-20.705537.274103.75823.75821×362 double1×362 double1×362 double1×362 double
14IHTA1401950423.8782-77.2741-60.705503.92163.92161×362 double1×362 double1×362 double1×362 double
In this dataset, we have four different subjects (subject), each coming for two sessions (session) on consecutive days at the lab. During each of these sessions they learn to control the displacement of a cursor on a screen, and their task is to reach targets with the cursor. The targets are arranged at discrete angles (reference_direction) in a circle around a starting point. The cursor is difficult to control, and as a marker for progress in the task, we record whether they reach the target in time (hit) and how long was their reaction time (m_reaction_time). Each line corresponds to a trial (trial_index), and we transformed the index in percentage of trials performed within session (valid_perc, goes from 0 to 100% in each session) or across sessions (valid_perc_session, goes from 0 to 200% across both sessions).

Continuous data

Here we will represent the evolution of the reaction time across trials for each subject (using facet_grid()). To treat sessions independently we will map them to color.
figure('Position',[100 100 800 400])
 
g=gramm('x',T.valid_perc_session,'y',T.m_reaction_time,'color',T.session);
g.facet_grid([],T.subject);
g.geom_point('alpha',0.5);
g.set_names('x','Task progression (%)','y','Reaction time (ms)','color','Session','column','Subject');
g.draw();
The points indicate a tendency for a decrease of the reaction time in all subjects, that tends to taper off in the second session. How can we add statistical layers that will help us visualize this ?

Statistical layers for continuous data

There are several stat_ layers that allow this, they can be selected with the first drop-down menu.

stat_summary()

The most basic approach is simply to bin data across the x-axis and represent descriptive statistics like the mean reaction time and its confidence interval in each bin. This can be done with stat_summary() using the 'bin_in' argument which value sets the total number of bins across the x axis.

stat_smooth()

This uses a smoothing approach, either with a built-in fast smoothing or using underlying matlab tools from the curve fitting toolbox (splines, moving averages, loess, etc.). The 'lambda' parameter sets the smoothing intensity. The confidence interval is computed by bootstrap

stat_glm()

This uses a generalized linear model (fitglm() from the statistics toolbox), which by default is configured as a classical linear regression. Here we use an inverse gaussian distribution which is adapted to reaction time data.

stat_fit()

This uses an anonymous function to fit any non-linear model. Here we use an exponential function provided with the 'fun' argument. This function requires starting values for the fit parameters with 'StartPoint'.
By default this method uses fit() from the curve fitting toolbox but can be configured to use fitnlm() from the statistics toolbox.
Pick the stat_ layer :
vis = "stat_glm";
All these layers use a common 'geom' argument to specify how the results are displayed. Select one with this drop-down menu:
geom = 'lines';
 
figure('Position',[100 100 800 400])
g=gramm('x',T.valid_perc_session,'y',T.m_reaction_time,'color',T.session);
g.facet_grid([],T.subject);
g.geom_point('alpha',0.1);
 
switch vis %Call the correct gramm method depending on the chosen visualization
case "stat_summary"
g.stat_summary('geom',geom,'bin_in',10);
case "stat_smooth"
g.stat_smooth('geom',geom,'lambda',1e5);
case "stat_glm"
g.stat_glm('geom',geom,'distribution','inverse gaussian','disp_fit',false,'fullrange',false);
case "stat_fit"
g.stat_fit('geom',geom,'fun',@(a,b,c,x)a.*exp(x.*b)+c,'StartPoint',[1000 -0.01 150],'intopt','functional','disp_fit',false);
end
 
g.set_names('x','Task progression (%)','y','Reaction time (ms)','color','Session','column','Subject');
g.draw();
%Export
g.export('file_name','xy_export','file_type','png');

Clustered data

In the previous section, we explored data where an underlying y=f(x) is assumed. gramm also proposes tools to explore x/y (and even z) data that corresponds to clusters or groupings.
First we will compute the midpoints of all cursor trajectories (see TimeSeries.mlx)
T.xmid = cellfun(@(x)x(floor(length(x)/2)),T.px);
T.ymid = cellfun(@(x)x(floor(length(x)/2)),T.py);
We will examine the evolution of those midpoints between the first and second experimental session for all subjects. Use the picker below to chose a visualization
To note:
vis = "stat_ellipse";
 
figure
g=gramm('x',T.xmid,'y',T.ymid,'group',T.reference_direction,'lightness',T.session,'color',T.reference_direction);
switch vis
case "geom_point"
g.geom_point();
g.set_point_options('base_size',3);
case "stat_ellipse"
g.stat_ellipse('type','ci');
case "stat_bin2d"
g=gramm('x',T.xmid,'y',T.ymid);
g.facet_grid([],T.session);
g.stat_bin2d('geom','image','edges',{[-100:10:100],[-140:10:60]});
end
g.axe_property('DataAspectRatio',[1 1 1],'XLim',[-100 100],'YLim',[-140 60]);
g.set_color_options('map','d3_20');
g.set_names('x','X traj. midpoint (mm)','y','Y traj. midpoint (mm)','color','Direction (°)','column','Session','lightness','Session');
g.draw();
This figure demonstrates that on the second day, the subjects are less variable in their movements and go further along the target direction at the midpoint of the movement.