Low-dimensional morphospace of topological motifs in human fMRI brain networks

We present a low-dimensional morphospace of fMRI brain networks, where axes are defined in a data-driven manner based on the network motifs. The morphospace allows us to identify the key variations in healthy fMRI networks in terms of their underlying motifs, and we observe that two principal components (PCs) can account for 97% of the motif variability. The first PC of the motif distribution is correlated with efficiency and inversely correlated with transitivity. Hence this axis approximately conforms to the well-known economical small-world trade-off between integration and segregation in brain networks. Finally, we show that the economical clustering generative model proposed by Vértes et al. (2012) can approximately reproduce the motif morphospace of the real fMRI brain networks, in contrast to other generative models. Overall, the motif morphospace provides a powerful way to visualize the relationships between network properties and to investigate generative or constraining factors in the formation of complex human brain functional networks.


Cumulative variability plot
Supplementary Figure 1 plots the amount of variability in the motif scores explained by each of the six principal components. As stated in the main text, 86% of the motif variability is explained by the first principal component and 97% is explained by the first two principal components. The correlations between PCs 1 and 2 and the global metrics and the average Euclidean distance of the longest 5% of edges are also similar in networks with 400 or 1200 edges, as shown in Supplementary Table 1.
Here both the Pearson correlation coefficients (r) and the p-values for Pearson's correlation using a Student's t distribution are shown. Note that with 1200 edges, the direction of PC2 is reversed with respect to the underlying motifs, hence the sign of the correlations of global metrics with PC2 are also reversed. However, since this direction is arbitrary, the change in sign is not important.

Changing the parcellation
Throughout the main text, we use the AAL parcellation with 90 regions. Here we show that our results are robust to changing the parcellation. In particular, we consider the AICHA parcellation [1] with 384 regions. This parcellation aims to preserve homotopy. In our case, we were interested in testing our results using a parcellation with a larger number of regions to test the robustness of the motifs extraction.
As before, the networks are thresholded at approximately 20%, with 14689 edges per network in this case. The motif fingerprint and biplot we obtain are shown in Supplementary Figure 4 and are remarkably similar to those from our data and parcellation shown in Figure 1 of the main text. We note that an even higher percentage of variability is explained by the first PC (over 92% of the total variability). As shown in Supplementary

Re-test dataset
Using a re-test HCP dataset, in which all 100 subjects were scanned a second time with identical acquisition parameters, we obtain a similar motif fingerprint, cumulative variability plot and motif biplot as before. Results are shown in Supplementary Figure 5. The correlations with global metrics and with the average Euclidean distance of the longest 5% of edges are also similar in the re-test HCP dataset, see Supplementary

Changing the dataset
Above we showed that our results are robust in the HCP re-test dataset. We now test them in an independent dataset, which we refer to as the 'Cambridge' dataset. This dataset has been reported previously in [2] and includes 26 subjects of varying ages. Resting-state fMRI data was acquired using classical Gradient-echo planar imaging (EPI) in the Wolfson Brain Imaging Center (Cambridge, UK) (see [1] for the details of the parameter acquisition). The length of the acquisition was 9.6 minutes with a repetition time equal to 1.1 s. The same pre-processing procedure was applied as before to obtain time series with 90 anatomical regions and 512 samples in time. Again as before, the AAL parcellation was used and the networks are thresholded at 800% edges (approximately 20% cost).
The motif fingerprint and biplot we obtain are shown in Supplementary Figure 6 and are again similar to those from the HCP dataset shown in Figure 1 of the main text. The correlations with global metrics are very similar to our previous results, as shown in Supplementary Table 4, PC1 correlates positively with global efficiency and negatively with transitivity, whilst PC2 correlates negatively with assortativity.
The only result which was significant in the HCP dataset and is no longer significant in this dataset is the correlation between the average Euclidean distance of the longest 5% of edges. Whilst we still find a positive correlation in the Cambridge dataset (r=0.30), the p-value is > 0.05 (p=0.14). The reason for this difference is unclear, but could be due to the smaller number of subjects available in the Cambridge dataset. Further work is needed to clarify the reason for this difference.

5 node motifs
In Figure 1 of the main text, we present the cumulative variability and motif biplot for 4 node motifs. In Supplementary Figure 7, we show the same plots for 5 node motifs. Again, the first two principal components explain over 90% of the motif variability, even though there are 21 possible 5 node motifs. Hence there is significant overlap between the information contained within the motif counts. We note that PC1 explains slightly less of the variability than it did in the 4 node motif case and PC2 explains slightly more (74% instead of 86% and 18% instead of 11%). Figure 7 b plots the motif biplot for 5 node motifs, which is similar to the 4 node biplot; highly clustered motifs have low PC1, whilst 'chain-like' motifs have high PC1. PC2 reflects the assortativity of the motifs. The correlations with global metrics are shown in Supplementary    Figure 7: a) Cumulative variability of the 5 node motif fingerprints. b) 5 node motif biplot for 100 HCP fMRI brain networks, with key motifs labelled. In both parts, the networks were thresholded with 800 edges, as in Figure 1 of the main text.
As can be seen from the biplots, the motif directions are similar in both cases (the sign of PC1 is flipped, but this difference is unimportant since the sign is arbitrary). According to the sign convention in the original

Exploring the location of the long-distance edges in the brain
In the main text we compare the distributions of the Euclidean distances of the edges for two example subjects (Figure 1, parts c and d). In Supplementary Figures 10 and 11 we show the longest 5% of edges from these two subjects plotted on the brain. We observe that the long distance edges in the subject represented by a circle (whose longest edges are shorter on average) are more likely to be interhemispheric, whilst the subject represented by a square (whose longest edges are longer on average) are more likely to be intrahemispheric, often connecting the front and the back of the brain. Note that this does not mean that the subject denoted by the circle has more interhemispheric edges in total than the subject denoted by the square, rather that more of their longest edges are interhemispheric.
Supplementary Figure 10: Longest 5% of edges for the subject represented by the circle in Figure 2 of the main text, visualised with the BrainNet Viewer [4].
Supplementary Figure 11: Longest 5% of edges for the subject represented by the square in Figure 2 of the main text, visualised with the BrainNet Viewer [4].
To explore this finding further in the whole dataset, we counted the number of the longest 5% of edges in each subject which were interhemispheric. Note that 5% of the edges in the network corresponds to 40 edges in our networks (since they have 800 edges in total), hence the largest number of these edges which can be interhemispheric is 40. In other words, for each network we listed its 800 edges by the Euclidean distance between the two regions they connect, selected the 40 longest edges and then counted how many of these edges were interhemispheric. Supplementary Figure 12 plots the motif morphospace coloured by the number of the longest 5% of edges in each subject which were interhemispheric. As expected from Supplementary  Figures 10 and 11, we observe that the results correlate with PC1 (r=-0.44, p< 0.001), in other words the longest edges in networks with high PC1 scores (which tend to have high global efficiency) are more likely to be long distance edges within a single hemisphere than the longest edges in networks with low PC1 scores, which are more likely to be interhemispheric. Importantly, there is no correlation between PC1 or PC2 and the total number of interhemispheric edges in the network, hence on average networks with low or high PC1 scores still have the same number of interhemispheric edges in total.
Supplementary Figure 12: Motif morphospace coloured according to the number of the longest 5% of edges in each subject which are interhemispheric.

Economical clustering model
In order to determine η and γ for the generative model, we generated networks with a wide range of η and γ values and calculated the goodness of fit of the generated networks to the HCP brain networks. The goodness of fit was determined using the same energy function used by Vertes et al. [5], which optimises the networks for modularity, mean local clustering efficient, global efficiency and degree distribution. Note that the networks were not optimised for the motif fingerprint, or assortativity. Supplementary Figure 13 plots the energy function as a function of η and γ. The lower the energy, the better the fit to the networks. The minimum is found at γ = 3 and η = −2.5, therefore in the main text we set these values for η and γ. Note that as η and γ increase, the networks become more clustered, as can be seen from Figure 5 in the main text and the motif distribution shown in Supplementary Figure 14. Therefore increasing η and γ further is not expected to give a better approximation to the HCP brain networks.
The correlations between the PCs and the global efficiency, assortativity, transitivity and the average length of the longest 5% of edges are shown in Table 9. The results for global efficiency, transitivity and assortativity are similar to those observed in the real data with a single hemisphere only and a 10% threshold. The correlation between the Euclidean distance of the longest 5% of edges and PC1 is not statistically significant in the generated networks (p=0.07). This is partly, although not solely, due to the reduced correlation between the Euclidean distance of the longest 5% of edges and PC1 when only a single hemisphere is considered, which is unsurprising because there are fewer long distance connections. We also note that with a single hemisphere only, the global efficiency of both the real networks and the generated networks shows strong correlation with PC2 as well as with PC1. Interhemispheric connections are known to play an important role in the brain and further work is needed to fully understand their influence on the motif morphospace.
Supplementary Figure 13: The average energy values of networks generated using the economical clustering model as η and γ are varied (mean of 100 generated networks). The energy values are shown as log 10 (E), where E is as defined in [5].
Global efficiency Transitivity Assortativity Eucl. distance of longest 5% of edges Generated  The correlations between the PCs and the global metrics and the average Euclidean distance of the longest 5% of edges in generated and real fMRI brain networks with a 10% threshold and a single hemisphere. The Pearson correlation coefficients are shown. The p-values for Pearson's correlation using a Student's t distribution are also shown in brackets.

Economical preferential attachment model
Whilst the economical clustering model is able to reproduce the motif fingerprints from the real data, other models are not. For example, here we show results from the economical preferential attachment model, also described in [5]. As above, in order to determine η and γ we explore the goodness of fit of networks across the parameter space manually. The energy at different η and γ is shown in Supplementary Figure 15. Note that the energy is always higher than the optimal energy values obtained from the economical clustering model, suggesting that networks generated by the economical preferential attachment model are not as close a fit to the real fMRI brain networks. The solutions with the lowest energy values form a band across the parameter space. Since it is difficult to ascertain whereabouts in the band the optimal solution is, in Supplementary  Figure 16  the economical clustering model. This illustrates that it is not trivial to find a model which can reproduce the motif fingerprint.

Network similarity measures
An alternative way to compare the results from the generative models to the real fMRI networks is to use network similarity measures. In Supplementary Table 10 we use three network similarity measures to measure the differences between 1) the economical clustering model (with η = −3 and γ = 2.5) and the real fMRI networks, 2) the preferential attachment model (with η = −20 and γ = 6) and the real fMRI networks and 3) random networks with the same number of nodes and edges as the real fMRI networks (generated using the Brain Connectivity Toolbox function makerandCIJ und.m [6]) and the real fMRI networks. In each case we average the measure over all possible pairwise comparisons, using the 100 real networks and 100 generated/random networks. We report the mean average and the standard deviation of the results. The three measures we use are: 1) the number of edges the networks have in common (out of a possible total of 99 edges), 2) the Hamming distance and 3) a recently reported network dissimilarity measure, using the default parameters described in the paper (w 1 =w 2 =0.45, w 3 = 0.1) [7]. Note that this measure has a range from zero to one and is equal to zero in the case of identical networks, hence lower numbers mean a higher similarity.
We find that the differences between the real fMRI networks and the economical clustering model or the preferential attachment model are within one standard deviation from each other with all three methods, unlike the differences between the real fMRI networks and the random networks, which are much more substantial. For example, the number of edges in common with the real fMRI networks is 24.7 ± 4.1 and 26.6 ± 3.8 for the economical clustering model and the preferential attachment model respectively, compared to only 9.9 ± 2.9 edges in common between the real fMRI networks and the random networks. Nonetheless, a two-sided Wilcoxon signed rank test shows that there are statistically significant differences between the similarity measures comparing the 100 real fMRI networks and the two generative models. In particular, Supplementary Figure 15: The average energy values of networks generated using the economical preferential attachment model as η and γ are varied (mean of 100 generated networks). The energy values are shown as log 10 (E), where E is as defined in [5]. the preferential attachment model appears to be closer to the real networks in terms of the number of edges in common and the Hamming distance, whilst the economical clustering model is closer using the method proposed by Schieber et al (which was designed to account for topological properties of the networks).