Visibility graphs for fMRI data: Multiplex temporal graphs and their modulations across resting-state networks

Visibility algorithms are a family of methods that map time series into graphs, such that the tools of graph theory and network science can be used for the characterization of time series. This approach has proved a convenient tool, and visibility graphs have found applications across several disciplines. Recently, an approach has been proposed to extend this framework to multivariate time series, allowing a novel way to describe collective dynamics. Here we test their application to fMRI time series, following two main motivations, namely that (a) this approach allows vs to simultaneously capture and process relevant aspects of both local and global dynamics in an easy and intuitive way, and (b) this provides a suggestive bridge between time series and network theory that nicely fits the consolidating field of network neuroscience. Our application to a large open dataset reveals differences in the similarities of temporal networks (and thus in correlated dynamics) across resting-state networks, and gives indications that some differences in brain activity connected to psychiatric disorders could be picked up by this approach.

. The second avenue deals with applications of this machinery, primarily by using this method as a feature extraction procedure with which to build feature vectors that can properly characterize time series with the purpose of making statistical learning (see Bhaduri & Ghosh, 2016;Hou, Li, Wang, & Yan, 2016;Long, Fonseca, Aarts, Haakma, & Foussier, 2014;Shao, 2010, for a few examples in the life sciences). A visibility graph is a network in which the nodes are time points, and the links are defined according to the visibility criteria described in the text. In this latter context, the application to neuroscience is in its infancy and has been essentially limited so far to the analysis of electroencephalogram (EEG) data (see Ahmadlou, Adeli, & Adeli, 2010Ahmadlou, Ahmadi, Rezazade, & Azad-Marzabadi, 2013;Bhaduri & Ghosh, 2016;Mira-Iglesias, Conejero, & Navarro-Pardo, 2016 for a few examples). The study of fMRI recordings under this lens has been scarce, and in this work we would like to motivate and justify why we think this is a promising enterprise, both from a univariate and-perhaps more interestinglyfrom a multivariate time series perspective (Lacasa, Nicosia, & Latora, 2015). Multilayer net-Multilayer networks: Multilayer networks incorporate multiple types of interactions between the same nodes. This means that multivariate time series can be represented in a multilayer visibility graph.
works incorporate multiple types of interactions between the same nodes. This means that multivariate time series can be represented in a multilayer visibility graph. Among other strategies to map time series intro graphs, using the repertoire of visibility graphs is particularly interesting, not just because its current application is scarce, but also because these methods are well suited to handle the specificities of fMRI data. More concretely, these methods have been shown to be efficient in extracting information and dealing with (a) data polluted with noise ), (b) multivariate , and (c) non stationary time series . In order to showcase the usefulness of visibility graphs in neuroscience we will choose a biggish, high-quality public dataset of resting-state fMRI data (Poldrack et al., 2016), and will make use of the family of visibility algorithms to build a multilevel graph of temporal networks, where each node represents a time point, and two nodes are connected if they are visible to each other, according to the algorithm explained below. In the case of multivariate time series-as the ones acquired in neuroimaging-each of these networks is actually the layer of a multiplex network (usually associated with a recording in a different region of interest (ROI). Being able to integrate all the data in a single structure enables both the intralayer (univariate) and the interlayer (multivariate) analysis simultaneously. We will show that a direct analysis of this network provides genuine and nontrivial information on fMRI data, potentially including the description and possible noninvasive classification of some brain diseases.

fMRI data
We used the public dataset described in Poldrack et al. (2016). These data were obtained from the OpenfMRI database, with accession number ds000030. We use resting-state fMRI data from 121 healthy controls, 50 individuals diagnosed with schizophrenia, 49 individuals diagnosed with bipolar disorder, and 40 individuals diagnosed with ADHD (attention-deficit/ hyperactivity disorder). The demographics are reported in the original paper, and they can additionally be found in the GitHub page containing the results of this study (Marinazzo, 2017a).
The fMRI data were preprocessed with FSL (FMRIB Software Library v5.0). The volumes were corrected for motion, after which slice timing correction was applied to correct for temporal alignment. All voxels were spatially smoothed with a 6 mm FWHM (full width at half maximum) isotropic Gaussian kernel and after intensity normalization, a band pass filter was applied between 0.01 and 0.08 Hz. In addition, linear and quadratic trends were removed. We next regressed out the motion time courses, the average cerebrospinal fluid signal, and the average white matter signal. Global signal regression was not performed. Data were transformed to the MNI152 template, such that a given voxel had a volume of 3 mm × 3 mm × 3mm. Finally, we averaged the signal in 278 ROIs using the template described in Shen, Tokoglu, Papademetris, & Constable (2013). In order to localize the results within the intrinsic connectivity network of the resting brain, we assigned each of these ROIs to one of the nine resting-state networks (seven cortical networks, plus subcortical regions and cerebellum) as described in Yeo et al. (2011).

Construction of the visibility graphs
The procedure to build up a visibility graph is extensively and clearly described in Lacasa et al. (2008Lacasa et al. ( , 2009Lacasa et al. ( , 2012 for univariate and  for multivariate time series. Here we will recall the basic steps and provide a visualization of the application of the methodology to BOLD data.
Given a time series of N data, any two time points i and j in which the measured quantity takes the values y i and y j , respectively, will have visibility and consequently will become two connected nodes in the associated natural visibility graph if any other data point y k placed between them fulfills the following condition: Together with this convexity criterion, named natural visibility, an ordering criterion, named horizontal visibility, has also been defined ). According to the latter, two time points i and j, in which the measured quantity takes the values y i and y j , respectively, will now have horizontal visibility if any other data point y k placed between them is smaller; that is, In either case, the resulting graphs have N nodes, are connected by a trivial Hamiltonian path that induces a natural ordering in the degree sequence, and are undirected (see Figure 1 for an illustration). In the event that the time arrow turns out to be a relevant aspect, directed graphs can be easily constructed, as detailed in Lacasa et al. (2012). Note that the resulting horizontal visibility graph (HVG) is simply a core subgraph of the natural visibility graphs (NVG), the former being analytically tractable (Lacasa, 2014). As a matter of fact, HVG can be understood as an order statistic (Lacasa & Flanagan, 2015) and therefore filters out any dependency on the series marginal distributions (this is not true for NVG, so in applications where marginal distributions are relevant, one should use NVG rather than HVG).
Both algorithms are fast: naive implementations of NVGs have a runtime complexity O(N 2 ); however, a divide-and-conquer strategy already reduces it to O(N log N) (Lan, Mo, Chen, Liu, & Deng, 2015). Naive implementation of HVG is already O(N log N) in most of the cases of practical interest. Finally, these methods are well suited to handle several degrees of nonstationarity in the associated time series (Lacasa & Flanagan, 2015).
In this work we will be analyzing BOLD data, and for that task we decided to choose NVG over HVG. This is because NVGs are in principle better suited to handle and extract long-range correlations than HVGs, as the former naturally allow for the development of hubs, which will be typically associated with extreme events in the data and can correlate with data at all scales. Correlations in time series are actually inherited in graph space in the degree distribution. It is somewhat easier to find fat-tailed degree distributions in NVGs (which account for hubs with extremely large degrees). On the other hand, HVGs (which have shown to work fine with processes evidencing short-range correlations) typically display exponentially decaying degree distributions; this feature is linked to short-scale visibility, making this method more local.
For illustration, Figure 1 depicts how the links are established in the visibility graph according to both visibility criteria. The code used to compute the visibility graphs is available on GitHub (Marinazzo, 2017b) and it is basically a translation to Matlab of the original visibility scripts in Fortran 90 (see http://www.maths.qmul.ac.uk/∼lacasa/VG.f90).
When it comes to the application to multivariate time series formed by M series, note that each of the M time series yields a different visibility graph to begin with, so in principle the multivariate series can always be mapped into a multilayer graph with M layers . Moreover, since for every node i there is a natural correspondence across layers (node i corresponds to time stamp i, and this is the same time stamp for all components), there exist a natural alignment between every node of each layer, so the multilayer graph is effectively a so-called multiplex network (Boccaletti et al., 2014;Lacasa et al., 2015, see Figure 2 for an illustration). Of course, other smarter alignments between graphs could be investigated (for instance, one could try to find the alignment that minimizes some sort of Hamming distance between ordered node sets), but in this work we keep it simple and consider the natural alignment induced by the time arrow.
Interestingly, this multiplex visibility graph encodes the complex structure of each time series in the topology of each layer. One can therefore extract in each layer any desired topological feature (say for instance, the entropy over the degree distribution, which would provide a different number for each layer), with which one could build a feature vector that provides a compact representation of the multivariate time series complexity. A similar procedure was followed, for instance, in Ahmadlou et al. (2010) to extract markers of Alzheimer's disease from a graph theoretical characterization of the Hurst index of EEG data.
Second, the complex interdependencies and correlations that might emerge in a multivariate series across variables could in turn be extracted using similarity measures across layers. There exist a large variety of network measures that one can use for this task . A simple example of such a measure is the so-called interlayer mutual information, recently explored in the context of multiplex visibility graphs of coupled chaotic maps . This quantity measures the information shared by every two layers based on the similarity of the degree distributions. Given the degree distributions P(k α ) and P(k β ) of two arbitrary layers α and β, it is defined as .
As the degree distribution captures the structure of each layer, this measure is in turn capturing the information shared between the two layers, that is, the information shared across each time series component of the multivariate time series. Now, since this is an M × M matrix whose ij entry provides the mutual information between layers (ROIs) i and j, one can then, Scheme of the procedure: Within a given region that aggregates a certain number of ROIs, one constructs a visibility graph per ROI and builds accordingly a multiplex visibility graph. We then compute the pairwise mutual information between degree distributions across the multiplex layers (ROIs) and finally average to obtain a value for each RSN (resting-state network). The multilayer network is visualized with MuxViz (De Domenico, Porter, & Arenas, 2015).
for instance, average across pairs (that is, across ROIs) to find a scalar quantity MI : the mean value of the mutual information for each intrinsic connectivity network. This methodology is depicted in Figure 3. Note that other informational or similarity measures between layers could be used instead (e.g., edge overlap, conditional or partial mutual information, transfer entropy). Here for the sake of exposition, we consider only mutual information.
The visibility algorithms produce networks whose nodes are time points. As one can observe in Figure 3, these networks have a modular structure, in which subnetworks are constituted by time points that are mainly adjacent. A network has a modular structure if it can be divided into subnetworks (modules) characterized by a higher probability of connections within each model than across models. A modular structure in a temporal network is thus an indication of different temporal regimes. The existence of these temporal regimes is what motivated the study of dynamical functional connectivity (see, for example, Hansen, Battaglia, Speigler, Deco, & Jirsa, 2013;Hutchison et al., 2013). Dynamic functional connectivity can be seen in the visibility framework as the comparison of the temporal networks, taking their modular structure into account. This comparison can be done in the first place considering the modular network as a whole. In our case we partitioned the visibility graphs for each ROI using 100 runs of the Louvain algorithm. We then quantified the distance between the two partitions by means of the mutual information, using the function in the Brain Connectivity Toolbox (Rubinov & Sporns, 2010). The results of the partition of two ROIs, one in the anterior cingulate cortex (ACC) and one in the precuneus (PCC), are shown in Figure 4. The modules of the graphs correspond to consecutive time points (left panels); that is partitioning the visibility graph provides a natural decomposition of the time series in time intervals. Turning to the interdependency between the two time series, the right panel of Figure 4 represents the Sorensen similarity between each pair of modules in the two time series. It shows that there are segments with high Sorensen indexes, and it is likely that during these segments the two ROIs reflect similar neural events.

RESULTS
We start by reporting in Figure 5 the results of MI within each of the intrinsic connectivity networks, for the four groups of subjects considered. For each group of subjects, each circle corresponds to MI of a given subject, and random average shifted histograms are also provided. This representation is not parametric, and it is bounded. The plots report the median of the Harrell-Davis estimator, and the 95% high density intervals using a Bayesian bootstrap.
Harrell-Davis estimator: The Harrell-Davis estimator does is independent of the distribution (nonparametric) and is a weighted linear combination of order statistics.
The Harrell-Davis estimator doe is independent of the distribution (nonparametric) and is a weighted linear combination of order statistics.
The outliers are detected based on the distance between each pair of data points without assuming symmetry of distributions. In order to account for departure from normality of these distributions, we used a graphical approach and computed the Kolmogorov-Smirnov distance (Rousselet, Pernet, & Wilcox, Kolmogorov-Smirnov statistic: The Kolmogorov-Smirnov statistic is a nonparametric test of the equality of continuous, one-dimensional probability distributions. 2017), obtaining values up to 0.7 (a value of 0.39 would correspond to rejecting the null hypothesis at a level α < 0.001 for the smallest population). The Kolmogorov-Smirnov statistic is a nonparametric test of the equality of continuous, one-dimensional probability distributions.
The number of ROIs constituting each intrinsic state network (thus a proxy for the network size, given that Shen's parcellation has ROIs of similar size) is not correlated with the average value of the mutual information. In particular, it is interesting to observe that the intrinsic connectivity network called limbic in Yeo's parcellation is the smallest one, but nonetheless it has a low interlayer mutual information compared with the other networks for all the clinical groups.
The network that showed the clearest differentiation in terms of the average interlayer mutual information among the four clinical groups is indeed the Limbic one ( Figure 6). This evidence was assessed by means of a multivariate response test with age of the subjects and framewise displacement as covariates. The p value of 0.005 was corrected for multiple comparisons using the Bonferroni-Holm criterion with α = 0.05. The Kolmogorov-Smirnov statistics of the pairwise comparison between the distributions of average interlayer mutual information values for these particular networks ranged from 0.15 to 0.3. The null hypothesis of values for controls and schizophrenics drawn from the same distribution would be rejected with an α < 0.005. Figure 6 also reports the shift functions to visualize the difference between two distributions, in this case controls and schizophrenics. The shift function can help us understand and quantify how the two distributions differ. The shift function describes how one distribution should be rearranged to match the other one: it estimates how and by how much one distribution must be shifted.

Shift function:
The shift function can help us understand and quantify how the two distributions differ. The shift function describes how one distribution should be rearranged to match the other one: it estimates how and by how much one distribution must be shifted.
This function (Wilcox, 1995) does not assume (as t tests do) that two distributions differ only in the location of the bulk of the observations, and it enables determination of how, and by how much, two distributions differ. Here the Harrell-Davis quantile estimator is used. Confidence intervals of the decile differences with a bootstrap estimation of the standard error of the deciles are computed, and one controls for multiple comparisons so that the type I error rate remains around 0.05 across the nine confidence intervals (Rousselet et al., 2017). In this specific case we can observe a clear separation for all the quantiles but the ninth one.
To complement this analysis, in Figure 1 we further report two additional ways in which results of this kind are often represented (mean and standard errors). According to this plot, it is already evident to the naked eye that the method easily distinguishes controls from patients with any mental disorder, suggesting that visibility graphs do indeed extract informative features that can be used for noninvasive diagnosis. Visualizing results in such a way is indeed suboptimal and sometimes problematic (nicely explained in Rousselet, Foxe, & Bolam, 2016); for this reason we initially chose the visualizations provided in Figures 5 and 6 (Rousselet et al., 2017).

Why the (Multivariate) Visibility Graph?
All in all, there are several reasons why we think that visibility graphs are a convenient tool. We discuss some of these reasons below.

Usefulness:
Visibility graphs have been shown to inherit in their topology the essence of the associated dynamics, including nontrivial fingerprints that are both descriptive and informative for statistical learning purposes.

Fit for purpose:
These methods can be used directly in both stationary and nonstationary signals (i.e., nonstationarity is not required to be removed). Also, series do not require ad hoc phase partitioning or symbolization. Also, visibility graphs naturally filter out linear trends, so they do not require such detrending (Lacasa et al., 2008). Furthermore, since HVG is an order statistic, it is also invariant under monotonic (order-preserving) rescaling on the data (Lacasa & Flanagan, 2015). The NVG is not invariant under this latter transformation however, so nonlinear rescaling to make data more "peaky" will necessarily modify the associated NVG in a nontrivial way.

Computationally easy and efficient:
The method is numerically straightforward to implement and the runtime algorithms are quite decent, varying from O(N) for so-called visibility sequential motifs (Iacovacci & Lacasa, 2016) to O(N log N) for the full adjacency matrices using a divide-and-conquer strategy.

Amenable to analytical insight:
Unlike other strategies for graph theoretical time series analysis, visibility graphs are not computational black boxes. More particularly for HVG ( but not only Iacovacci & Lacasa, 2016;Luque et al., 2009), there exist several theorems available and methods to build rigorous results of HVG properties (Lacasa, 2014(Lacasa, , 2016Lacasa et al., 2009Lacasa et al., , 2012Luque et al., 2013). The latter is an area of intense research activity at the interface between combinatorics and dynamical systems.

Versatile:
The methods are not context dependent but are generally applicable to both univariate and multivariate time series across the disciplines. A drawback of this property is that the topological features one can extract from these graphs are themselves not context dependent.

Novel:
It builds a bridge between time series and networks and thus opens the exciting possibility of exploring the usefulness of a large set of new tools in the endeavor to describe and classify complex signals.
Coming back to the specific reason why we think that natural visibility graphs are particularly suited for BOLD data, it has been shown that relevant information on the time course of the BOLD signal and on correlated activity can be extracted by looking at single frames, corresponding to peaks in the signal (Liu & Duyn, 2013;Tagliazucchi, Balenzuela, Fraiman, & Chialvo, 2012), and that these events could be the proxy for an innovation signal at the neural level (Karahanoglu & Van De Ville, 2015;Wu et al., 2013). In this framework, the degree of the nodes corresponding to the BOLD peaks in the adjacency matrix constructed according to natural visibility emphasizes the functional relevance of the neural events and of the corresponding patterns of coactivation across the brain. However, both NVG and HVG have been shown to be useful in different contexts, so there is no general rule of thumb on what method should we use: this choice shall be addressed on a case-by-case basis.
Finally, what is important and informative when describing the properties of a certain cognitive state? Is it the complex pattern underlying the structure of individual time series (that is, local activity of ROIs) of different regions? Or are the correlations and interdependencies (understood in a broad sense) between these regions the key aspect to look at? When the latter is the case, a functional network analysis approach (Bullmore & Sporns, 2009) seems appropriate. In the former case where the nature of local activity across regions already captures information (He, 2011;Zang, Jiang, Lu, He, & Tian, 2004), one does not need to resort to functional dependencies and local analysis is the correct thing to do. This is obviously an open question that should be addressed, from a biological point of view, on a case-by-case basis. A recent study suggests that both conceptual frameworks can indeed be connected (Sethi, Zerbi, Wenderoth, Fornito, & Fulcher, 2017). In general, both aspects likely play a relevant role, and some studies have already successfully merged the two (Ciuciu, Abry, & He, 2014;Tagliazucchi et al., 2016). Nevertheless, the multiplex visibility framework offers a compact way of extracting at once both the local temporal structure (via the network intralayer properties) and the global interconnection pattern (via multiplex interlayer similarities).

Similarity with other measures
We discussed at the end of the Methods section that the modular temporal graphs resulting from the visibility algorithm are a natural way to describe different dynamical regimes of individual time series, and their interdependence, without arbitrary and possibly problematic choices such as a sliding window and its length (Hindriks et al., 2016;Kudela, Harezlak, & Lindquist, 2017).
Features of the visibility graph, such as the modularity, the clustering coefficient, or the node degree, could be used as features in classification algorithms aimed to detect modulations of the local and correlated dynamical regime of BOLD signals.

Modularity in networks:
A network has a modular structure if it can be divided into subnetworks (modules) characterized by a higher density of connections within each module than across modules. Furthermore, using the excellent resource NeuroVault (http://neurovault.org/), we also looked at the maps depicting the results of other measures and noticed that the areas belonging to the limbic Yeo network are associated with lower levels of regional homogeneity (Zang et al., 2004), higher coefficient of variation of the BOLD signal (Wu & Marinazzo, 2016), and lower value of the fractional amplitude of low-frequency fluctuations (fALFF) (Zou et al., 2008). This evidence speaks to the fact that interlayer mutual information in multiplex visibility networks is associated with decreased predictability and increased independence between the degrees of freedom of the measured time series.

Classification of neural disorders
The main focus of this paper is methodological, and a thorough discussion of the implications of our results on neuroimaging studies of psychiatric disorders is beyond its scope; moreover, we would not want to hypothesize after the results are known (HARKing) (Poldrack et al., 2017). However, it is interesting to highlight that the limbic network has been previously associated with mental disorders (Kiehl, 2006;Liston, Cohen, Teslovich, Levenson, & Casey, 2011;Potvin, Lungu, Tikàsz, & Mendrek, 2017;Rdulescu & Mujica-Parodi, 2009;Roberts et al., 2016;Whalley et al., 2012). In the same way, we refer the reader to recent studies specifically aimed at using advanced neuroimaging data analysis tools to map and classify neural disorders (Cetin et al., 2016;Demirci et al., 2008;Miller, Vergara, Keator, & Calhoun, 2016), and Fornito, Zalesky, & Breakspear (2015) for a review. Our results shown here using visibility graphs confirm some of this previous work and further showcase that visibility graphs extract informative features with which we can find statistically significant signatures of different neural disorders.
To conclude, given the exposition and results reported in this study, we hope to have motivated our colleagues to consider visibility graphs as a valuable tool for both exploratory and focused studies.