Exploring x/y data with gramm
In this example file, we will go further in exploring gramm's capabilities for data where some independent variables are continuous data.
To benefit from interactive elements, you should open it in MATLAB's editor with
We will load a partial dataset from a human movement science experiment
websave('example_movement','https://github.com/piermorel/gramm/raw/master/sample_data/example_movement.mat'); %Download data from repository
load example_movement.mat
T
T = 3170×13 table
| | subject | session | trial_index | reference_direction | hit | m_reaction_time | valid_perc | valid_perc_session | px | py | t | tperc |
|---|
| 1 | 2 | 3 |
|---|
| 1 | IHTA | 1 | 2 | 105 | 0 | 616.7897 | -20.7055 | 37.2741 | 0 | 0.1634 | 0.1634 | 1×362 double | 1×362 double | 1×362 double | 1×362 double |
|---|
| 2 | IHTA | 1 | 3 | 60 | 1 | 404.2587 | 40 | 29.2820 | 0 | 0.3268 | 0.3268 | 1×317 double | 1×317 double | 1×317 double | 1×317 double |
|---|
| 3 | IHTA | 1 | 4 | 330 | 0 | 341.6924 | 69.2820 | -80 | 0 | 0.4902 | 0.4902 | 1×362 double | 1×362 double | 1×362 double | 1×362 double |
|---|
| 4 | IHTA | 1 | 7 | 240 | 1 | 303.2130 | -40 | -109.2820 | 0 | 0.8170 | 0.8170 | 1×226 double | 1×226 double | 1×226 double | 1×226 double |
|---|
| 5 | IHTA | 1 | 9 | 15 | 1 | 283.2674 | 77.2741 | -19.2945 | 0 | 1.1438 | 1.1438 | 1×349 double | 1×349 double | 1×349 double | 1×349 double |
|---|
| 6 | IHTA | 1 | 14 | 150 | 0 | 306.7775 | -69.2820 | -0 | 0 | 1.7974 | 1.7974 | 1×362 double | 1×362 double | 1×362 double | 1×362 double |
|---|
| 7 | IHTA | 1 | 16 | 60 | 1 | 294.8469 | 40 | 29.2820 | 0 | 1.9608 | 1.9608 | 1×129 double | 1×129 double | 1×129 double | 1×129 double |
|---|
| 8 | IHTA | 1 | 18 | 240 | 1 | 320.3605 | -40 | -109.2820 | 0 | 2.2876 | 2.2876 | 1×159 double | 1×159 double | 1×159 double | 1×159 double |
|---|
| 9 | IHTA | 1 | 19 | 150 | 0 | 367.6910 | -69.2820 | -0 | 0 | 2.4510 | 2.4510 | 1×362 double | 1×362 double | 1×362 double | 1×362 double |
|---|
| 10 | IHTA | 1 | 27 | 195 | 1 | 309.4387 | -77.2741 | -60.7055 | 0 | 2.9412 | 2.9412 | 1×283 double | 1×283 double | 1×283 double | 1×283 double |
|---|
| 11 | IHTA | 1 | 29 | 15 | 1 | 332.9378 | 77.2741 | -19.2945 | 0 | 3.1046 | 3.1046 | 1×167 double | 1×167 double | 1×167 double | 1×167 double |
|---|
| 12 | IHTA | 1 | 34 | 330 | 0 | 369.4775 | 69.2820 | -80 | 0 | 3.4314 | 3.4314 | 1×295 double | 1×295 double | 1×295 double | 1×295 double |
|---|
| 13 | IHTA | 1 | 37 | 105 | 0 | 337.2823 | -20.7055 | 37.2741 | 0 | 3.7582 | 3.7582 | 1×362 double | 1×362 double | 1×362 double | 1×362 double |
|---|
| 14 | IHTA | 1 | 40 | 195 | 0 | 423.8782 | -77.2741 | -60.7055 | 0 | 3.9216 | 3.9216 | 1×362 double | 1×362 double | 1×362 double | 1×362 double |
|---|
| ⋮ |
|---|
In this dataset, we have four different subjects (subject), each coming for two sessions (session) on consecutive days at the lab. During each of these sessions they learn to control the displacement of a cursor on a screen, and their task is to reach targets with the cursor. The targets are arranged at discrete angles (reference_direction) in a circle around a starting point. The cursor is difficult to control, and as a marker for progress in the task, we record whether they reach the target in time (hit) and how long was their reaction time (m_reaction_time). Each line corresponds to a trial (trial_index), and we transformed the index in percentage of trials performed within session (valid_perc, goes from 0 to 100% in each session) or across sessions (valid_perc_session, goes from 0 to 200% across both sessions).
Continuous data
Here we will represent the evolution of the reaction time across trials for each subject (using facet_grid()). To treat sessions independently we will map them to color.
figure('Position',[100 100 800 400])
g=gramm('x',T.valid_perc_session,'y',T.m_reaction_time,'color',T.session);
g.facet_grid([],T.subject);
g.geom_point('alpha',0.5);
g.set_names('x','Task progression (%)','y','Reaction time (ms)','color','Session','column','Subject');
The points indicate a tendency for a decrease of the reaction time in all subjects, that tends to taper off in the second session. How can we add statistical layers that will help us visualize this ?
Statistical layers for continuous data
There are several stat_ layers that allow this, they can be selected with the first drop-down menu.
stat_summary()
The most basic approach is simply to bin data across the x-axis and represent descriptive statistics like the mean reaction time and its confidence interval in each bin. This can be done with stat_summary() using the 'bin_in' argument which value sets the total number of bins across the x axis.
stat_smooth()
This uses a smoothing approach, either with a built-in fast smoothing or using underlying matlab tools from the curve fitting toolbox (splines, moving averages, loess, etc.). The 'lambda' parameter sets the smoothing intensity. The confidence interval is computed by bootstrap
stat_glm()
This uses a generalized linear model (fitglm() from the statistics toolbox), which by default is configured as a classical linear regression. Here we use an inverse gaussian distribution which is adapted to reaction time data.
- The 'distribution' parameter allows to pick the distribution of the response variable and corresponding link function.
- When 'disp_fit' is set to true, the model equation is displayed in the figure as well as significance stars for each term
- 'fullrange' determines wheter the fit is displayed across the complete range of the x axis or only the data range.
stat_fit()
This uses an anonymous function to fit any non-linear model. Here we use an exponential function provided with the 'fun' argument. This function requires starting values for the fit parameters with 'StartPoint'.
- 'intopt' sets whether the shaded area represents the uncertainty of data or of the fitted function.
- When 'disp_fit' is set to true, the model equation is displayed in the figure
By default this method uses fit() from the curve fitting toolbox but can be configured to use fitnlm() from the statistics toolbox.
Pick the stat_ layer :
All these layers use a common 'geom' argument to specify how the results are displayed. Select one with this drop-down menu:
figure('Position',[100 100 800 400])
g=gramm('x',T.valid_perc_session,'y',T.m_reaction_time,'color',T.session);
g.facet_grid([],T.subject);
g.geom_point('alpha',0.1);
switch vis %Call the correct gramm method depending on the chosen visualization
g.stat_summary('geom',geom,'bin_in',10);
g.stat_smooth('geom',geom,'lambda',1e5);
g.stat_glm('geom',geom,'distribution','inverse gaussian','disp_fit',false,'fullrange',false);
g.stat_fit('geom',geom,'fun',@(a,b,c,x)a.*exp(x.*b)+c,'StartPoint',[1000 -0.01 150],'intopt','functional','disp_fit',false);
g.set_names('x','Task progression (%)','y','Reaction time (ms)','color','Session','column','Subject');
g.export('file_name','xy_export','file_type','png');
Clustered data
In the previous section, we explored data where an underlying y=f(x) is assumed. gramm also proposes tools to explore x/y (and even z) data that corresponds to clusters or groupings.
First we will compute the midpoints of all cursor trajectories (see TimeSeries.mlx) T.xmid = cellfun(@(x)x(floor(length(x)/2)),T.px);
T.ymid = cellfun(@(x)x(floor(length(x)/2)),T.py);
We will examine the evolution of those midpoints between the first and second experimental session for all subjects. Use the picker below to chose a visualization
- geom_point() to view raw datapoints
- stat_ellipse() to plot the center of each group and uncertainty ellipse (computed here as a 95% confidence interval on the center, with the hypothesis of a bivariate normal distribution)
- stat_bin2d() to plot a 2D histogram
To note:
- With geom_point() we used set_point_options() to use smaller points due to the large number of points
- Here we use the 'lightness' aesthetic to display session and pick a different default colormap with set_color_options(). Not all colormaps support lightness so only two are pickable.
g=gramm('x',T.xmid,'y',T.ymid,'group',T.reference_direction,'lightness',T.session,'color',T.reference_direction);
g.set_point_options('base_size',3);
g.stat_ellipse('type','ci');
g=gramm('x',T.xmid,'y',T.ymid);
g.facet_grid([],T.session);
g.stat_bin2d('geom','image','edges',{[-100:10:100],[-140:10:60]});
g.axe_property('DataAspectRatio',[1 1 1],'XLim',[-100 100],'YLim',[-140 60]);
g.set_color_options('map','d3_20');
g.set_names('x','X traj. midpoint (mm)','y','Y traj. midpoint (mm)','color','Direction (°)','column','Session','lightness','Session');
This figure demonstrates that on the second day, the subjects are less variable in their movements and go further along the target direction at the midpoint of the movement.