Exact topological inference of the resting-state brain networks in twins

A cycle in a brain network is a subset of a connected component with redundant additional connections. If there are many cycles in a connected component, the connected component is more densely connected. Whereas the number of connected components represents the integration of the brain network, the number of cycles represents how strong the integration is. However, it is unclear how to perform statistical inference on the number of cycles in the brain network. In this study, we present a new statistical inference framework for determining the significance of the number of cycles through the Kolmogorov-Smirnov (KS) distance, which was recently introduced to measure the similarity between networks across different filtration values by using the zeroth Betti number. In this paper, we show how to extend the method to the first Betti number, which measures the number of cycles. The performance analysis was conducted using the random network simulations with ground truths. By using a twin imaging study, which provides biological ground truth, the methods are applied in determining if the number of cycles is a statistically significant heritable network feature in the resting-state functional connectivity in 217 twins obtained from the Human Connectome Project. The MATLAB codes as well as the connectivity matrices used in generating results are provided at http://www.stat.wisc.edu/∼mchung/TDA.


INTRODUCTION
The modular structure and connected components are the fundamental topological features of a brain network. Brain networks with a higher number of connected components have hierarchical network modeling framework based on persistent homology has been proposed (Cassidy, Rae, & Solo, 2015;Chung, Hanson, Lee, Adluru, Alexander, Davidson, & Pollak, 2013;Giusti, Pastalkova, Curto, & Itskov, 2015;Lee, Chung, Kang, Kim, & Lee, 2011a, 2011bLee et al., 2012;Petri, Scolamiero, Donato, & Vaccarino, 2013;Petri et al., 2014;Sizemore, Giusti, & Bassett, 2016;Sizemore et al., 2018;Stolz, Harrington, & Porter, 2017). Persistent homology, a branch of computational topology (Carlsson & Memoli, 2008;Edelsbrunner & Harer, 2008;Edelsbrunner, Letscher, & Zomorodian, 2000), provides a more coherent mathematical framework for measuring network distance than the conventional method of simply taking the difference between graph theoretic features or the norm of the connectivity matrices. Instead of looking at networks at a fixed scale, as is usually done in many standard brain network analysis, persistent homology observes the changes of topological features of the network over multiple resolutions and scales (Edelsbrunner & Harer, 2008;Horak, Maletić, & Rajković, 2009;Zomorodian & Carlsson, 2005). In doing so, it reveals the most persistent topological features that are robust under noise perturbations. This robustness in performance under different scales is needed for most network distances that are parameter and scale dependent.
In persistent homology-based brain network analysis, instead of analyzing networks at one fixed threshold that may not be optimal, we build the collection of nested networks over every possible threshold by using the graph filtration, a persistent homological construct Graph filtration: A collection of nested graphs. (Chung et al., 2013;Lee et al., 2011aLee et al., , 2012. The graph filtration is a threshold-free framework for analyzing a family of graphs but requires hierarchically building specific nested subgraph structures. The graph filtration shares similarities to the existing multithresholding or multiresolution network models that use many different arbitrary thresholds or scales (Achard, Salvador, Whitcher, Suckling, & Bullmore, 2006;He et al., 2008;Kim, Adluru, Chung, Okonkwo, Johnson, Bendlin, & Singh, 2015;Lee et al., 2012;Supekar, Menon, Rubin, Musen, & Greicius, 2008). Such approaches are mainly used to visually display the dynamic pattern of how graph theoretic features change over different thresholds, and the pattern of change is rarely quantified. Persistent homology can be used to quantify such dynamic patterns in a more coherent mathematical framework. Recently, various persistent homological network approaches have been proposed. In Giusti et al. (2015) and Sizemore et al. (2016Sizemore et al. ( , 2018, graph filtration was developed on cliques. In Petri et al. (2013), weighted clique rank homology was developed. In Petri et al. (2014), the concept of homological scaffolds was developed and applied to the resting-state fMRI.
Metric space: A set with a metric defined on the set. They were later adapted to measure distances in persistent homology, dendrograms (Carlsson & Memoli, 2008;Carlsson & Mémoli, 2010;Chazal et al., 2009), and brain networks (Lee et al., 2011b(Lee et al., , 2012. The probability distributions of bottleneck and GH-distances are unknown. Thus, the statistical inference on them can only be done through resampling techniques such as permutations (Lee et al., 2012;Lee, Kang, Chung, Lim, Kim, & Lee, 2017), which often cause serious computational bottlenecks for large-scale networks.
To bypass the computational bottleneck associated with resampling large-scale networks, the Kolmogorov-Smirnov (KS) distance was introduced (Chung et al., 2013, 1;Lee et al., 2017). The advantage of using KS-distance is that its gives results that are easier to interpret than those obtained from less intuitive distances from persistent homology. Furthermore because of its simplicity in construction, it is possible to determine its probability distribution exactly without resampling (Chung et al., 2017b). However, the KS-distance has been only applied to the number of connected components β 0 , and it is unclear how to apply to the number of cycles β 1 in graphs and networks. In this paper, for the first time, we show how to extend the KS-distance by performing statistical inference on β 1 . This is achieved by establishing the monotonic property of the number of cycles over graph filtration. The monotonicity is then used in constructing the KS-distance for topologically differentiating two networks. Subsequently, the method is applied to the large-scale resting-state twin fMRI study in determining the heritability of the number of cycles.

CORRELATION BRAIN NETWORK
The edge weight, which measures the strength of a connection, is usually given by a similarity measure between the observed data on the nodes in brain networks. Various similarity measures have been proposed. The correlation or mutual information between measurements for the biological or metabolic network and the frequency of contact between actors for the social network have been used as edge weights (Bassett, Meyer-Lindenberg, Achard, Duke, & Bullmore, 2006;Bien & Tibshirani, 2011;Li, Liu, Li, Qin, Li, Yu, & Jiang, 2009;Mclntosh & Gonzalez-Lima, 1994;Newman & Watts, 1999;Song, Havlin, & Makse, 2005). In particular, the Pearson correlation has been most widely used as edge weights in functional brain network modeling.
Consider a weighted graph with node set V = {1, . . . , p} and edge weights w = (w ij ) between nodes i and j. Let x j = (x 1j , · · · , x nj ) ∈ R n be n × 1 measurement vector on node j. Let us center and normalize data x j such that Then we can show that ρ ij = x i x j is the Pearson correlation between x i and x j (Chung, Hanson, Ye, Davidson, & Pollak, 2015). Note that correlations are invariant under scale and translations. Naturally, we are interested in using correlations or their simple functions such as as edge weights. Among possible functions of correlations, satisfies triangle inequality w ij ≤ w ik + w kj and other metric properties (Chung, Lee, Solo, Davidson, & Pollak, 2017a). Having metric distances facilitates more mathematically coherent interpretation of brain networks and offers many nice mathematical properties. With such edge weight w, X = (V, w) forms a metric space. In the simulation studies in this paper, Equation 1 is used as the edge weights.

GRAPH FILTRATION
All topological network distances that will be introduced in later sections are based on filtrations on graphs by thresholding edge weights.
Definition 1 Given weighted network X = (V, w) with positive edge weight w = (w ij ), the binary network X = (V, w ) is a graph consisting of the node set V and the binary edge weights w given by Any edge weight less than or equal to is made into zero while edge weights larger than are made into one. Lee et al. (2011b, 1) defines the binary graphs by thresholding above, that is, w ij, = 1 if w ij <= , which is consistent with the definition of the Rips filtration. However, in brain imaging, the higher value of w ij indicates stronger connectivity. Thus, we are thresholding below and leave out stronger connections (Chung et al., 2013, 1).
Note w is the adjacency matrix of X , which is a simplicial complex consisting of 0-simplices (nodes) and 1-simplices (edges) (Ghrist, 2008). By increasing the filtration value , we are deleting more edges, so the size of the edge set decreases. Thus, the binary network satisfies the monotonic subset property for any 0 ≤ 1 ≤ 2 · · · . Equivalently, we also have The sequence of such nested multiscale graphs is defined as the graph filtration (Lee et al., 2011b, 1). Note that X 0 is the complete graph and X ∞ is the node set V. For a graph with p nodes, the maximum number of edges is (p 2 − p)/2, which is obtained in a complete graph. If we order the edge weights in increasing order, we have the sorted edge weights: where q ≤ (p 2 − p)/2. The subscript ( ) denotes the order statistic. Hence, we simply construct the graph filtration at the edge weights The condition of having unique edge weights is not restrictive in practice. Assuming edge weights to follow some continuous distribution, the probability of any two edges being equal is zero. The finiteness and uniqueness of the filtration levels over finite graphs are intuitively clear by themselves and are implicitly assumed in software packages such as javaPlex (Adams, Tausz, & Vejdemo-Johansson, 2014).

BETTI NUMBERS
In persistent homology, the k-th Betti number is often referred to as the number of k-dimensional holes (Lee et al., 2014, 1;Petri et al., 2014;Sizemore et al., 2018). In network setting, the 0-th Betti number is the number of connected components and the 1st Betti number is the number of cycles. During graph filtration, we can show that β 0 and β 1 monotonically change. Although it is not true in general (Bobrowski & Kahle, 2014), on the graph filtration (2), β 0 and β 1 numbers have very stable monotonic increases and decreases respectively. Theorem 1 In a graph, Betti numbers β 0 and β 1 are monotone over graph filtration on edge weights.
Proof. Under graph filtration (2), the edges are deleted one at a time. Since an edge has only two end points, the deletion of the edge disconnects the graph into at most two. Thus, the number of connected components (β 0 ) always increases, and the increase is at most by one. The Euler characteristic χ of the graph is given by (Adler, Bobrowski, Borman, Subag, & Weinberger, 2010) where p and q are the number of nodes and edges respectively. Thus, Note p is fixed over the filtration but q is decreasing by one while β 0 increases at most by one. Hence, β 1 always decreases and the decrease is at most by one.
Theorem 1 is related to the incremental Betti number computation over a simplical complex (Boissonnat, & Teillaud, 2006). Once we compute β 0 number, β 1 number is simply given by β 0 − p + q without additional computation. For the computation of β 0 , it is not necessary to perform graph filtration for infinitely many possible filtration values. The maximum possible number of filtration level needed for computing β 0 is one plus the number of unique edge weights. In the case of trees, β 0 computation is exactly given.
Theorem 2 For tree T = (V, w) with p ≥ 2 nodes and unique positive edge weights the zeroth Betti number β 0 over graph filtration (2) is given by The proof is given in Chung et al. (2015). Note a tree with p nodes has p − 1 edges. For a graph that is not possible, it may not be possible to analytically represent β 0 over a filtration like Theorem 2. In general, β 0 can be numerically computed using the single linkage dendrogram (SLD) (Lee et al., 2012), the Dulmage-Mendelsohn decomposition (Chung, Adluru, Dalton, Alexander, & Davidson, 2011;Pothen & Fan, 1990), or the simplical complex method (Carlsson & Memoli, 2008;de Silva & Ghrist, 2007;Edelsbrunner, Letscher, & Zomorodian, 2002). In this study, we computed β 0 over filtration by using the Dulmage-Mendelsohn decomposition.

SINGLE LINKAGE CLUSTERING
The β 0 computation is related to single linkage clustering and dendrogram construction (Carlsson, 2009;Carlsson, De Silva, & Morozov, 2009;Carlsson, Singh, & Zomorodian, 2009b;Chowdhury & Mémoli, 2016;Khalid, Kim, Chung, Ye, & Jeon, 2014). In single linkage clustering, the single linkage distance (SLD) s ij between the closest nodes in the two disjoint connected components R 1 and R 2 is given by In this study, the square-root of 1 correlation is used as edge weight w kl . Every edge connecting a node in R 1 to a node in R 2 has the same SLD. The SLD is then used to construct the single linkage matrix (SLM) S = (s ij ) ( Figure 1). SLM shows how connected components are merged locally and can be used in constructing a dendrogram over filtration. If the single linkage distance s ij is larger than the current filtration value k but smaller than the next filtration value k+1 , that is, k ≤ s ij < k+1 . Then components R 1 and R 2 will be connected at the next filtration value k+1 . The sequence of how components are merged during the graph filtration is identical to the sequence of the merging in the dendrogram construction (Lee et al., 2012). By tracing how each of the connected components are merged, we can compute β 0 . In the single linkage clustering, instead of deleting edges, we are connecting nodes over increasing edge weights.
SLM is an ultrametric, which is a metric space satisfying the stronger triangle inequality s ij ≤ max(s ik , s kj ) (Carlsson & Mémoli, 2010). Thus the dendrogram can be represented as an ultrametric space D = (V, S), which is again a metric space. In persistent homology, the Gromov-Hausdorff (GH) distance has been mainly used in quantifying the dendrogram shape differences (Carlsson & Mémoli, 2010;Chung et al., 2017a;Lee et al., 2011b, 1). The GHdistance between dendrograms D 1 = (V, S 1 ) and D 2 = (V, S 2 ) with SLM S 1 = (s 1 ij ) and S 2 = (s 2 ij ) is given by For the statistical inference on GH-distance, resampling techniques such as jackknife or permutation tests are often used ( (Lee et al., 2012), 1). In this study, we will use the permutation test.

BOTTLENECK DISTANCE
The bottleneck distance is perhaps the most often used distance in persistent homology, although it is rarely used for brain networks. In persistent homology, the topology of underlying data can be represented by the birth and death of topological features, such as the number of connected components or cycles (Carlsson, Ishkhanov, De Silva, & Zomorodian, 2008). During the filtration, these topological features appear and disappear. If a topological feature appears at the threshold ξ and disappears at τ, it can be encoded into a point, If m number of connected components or cycles appear during the filtration of a network X = (V, w), the homology group can be represented by a point set This scatter plot is called a persistence diagram (PD) (Cohen-Steiner, Edelsbrunner, & Harer, 2007).
Given two networks X 1 = (V 1 , w 1 ) with m features and X 2 = (V 2 , w 2 ) with n features, PDs and are obtained through the filtration (Lee et al., 2012). The bottleneck distance between the networks is defined as the bottleneck distance of the corresponding PDs (Cohen-Steiner et al., 2007): where Note Equation 3 assumes m = n such that the bijection γ exists. Suppose two networks share the same node set, that is, V 1 = V 2 , with p nodes and the same number of q unique edge weights. If the graph filtration is performed on two networks, the number of connected components and cycles that appear and disappear during the filtration is p and 1 − p + q, respectively. Thus, their persistence diagrams always have the same number of points. The bijection γ is determined by the bipartite graph matching algorithm (Cohen-Steiner et al., 2007;Edelsbrunner & Harer, 2008).
If m = n, there is no one-to-one correspondence between two PDs. Then, auxiliary points that are orthogonal projections to the diagonal line ξ = τ in P (X 1 ) and P (X 2 ) are added to P (X 2 ) and P (X 1 ), respectively, to make the identical number of points in PDs.
The bottleneck distance does not directly measure the distance between two metric spaces X 1 = (V 1 , w 1 ) and X 2 = (V 2 , w 2 ), but measures the distance between their corresponding persistence diagrams P (X 1 ) and P (X 1 ). In practice, the bottleneck distance has been often used since it is a lower bound on the GH-distance and it is easier to compute (Chazal et al., 2009). Since the brain regions that form the network nodes are matched across the networks through predefined parcellations in brain network studies, the GH-distance can be computed easily. Thus, in this study, we will only use the GH-distance and not show the result of the bottleneck distance in the simulation study.

PERMUTATION TEST ON NETWORK DISTANCES
Statistical inference on network distances can be done using resampling techniques such as the permutation test (Chung et al., 2013;Efron, 1982;Lee et al., 2012). The permutation test is perhaps the most widely used nonparametric test procedure in the sciences (Chung et al., 2017b;Nichols & Holmes, 2002;Thompson, Cannon, Narr, van Erp, Poutanen, Huttunen, Lonnqvist, Standertskjold-Nordenstam, Kaprio, & Khaledy, 2001;Zalesky, Fornito, Harding, Cocchi, Yücel, Pantelis, & Bullmore, 2010). It is known as the exact test in brain imaging since the distribution of the test statistic under the null hypothesis can be exactly computed if we can calculate all possible values of the test statistic under every possible permutation.
Here we explain the permutation test procedure that was used for network distances. The usual setting in brain imaging applications is a two-sample comparison. Suppose there are m measurement in Group 1 on node set V of size p. Denote the data matrix as X 1 m×p . The edge weights of Group 1 are given by f (X 1 ) for some function f and the metric space is given by X 1 = (V, f (X 1 )). Suppose there are n measurement in Group 2 on the identical node set V. Denote data matrix as X 2 n×p and the corresponding metric space as X 1 = (V, f (X 1 )). We test the statistical significance of network distance D(X 1 , X 2 ) under the null hypothesis H 0 : The permutation test is done as follows. Under H 0 , one can concatenate the data matrices and then permute the indices of the row vectors of X in the symmetric group of degree m + n, that is, S m+n (Kondor, Howard, & Jebara, 2007). Denote the i-th permuted data matrix as X σ(i) = (x σ(i),j ), where σ ∈ S m+n . Then we split X σ(i) into submatrices such that where X 1 σ(i) and X 2 σ(i) are of sizes m × p and n × p respectively. Let X 1 σ(i) = (V, f (X 1 σ(i) )) and X 2 σ(i) = (V, f (X 2 σ(i) )) be weighted networks where the rows of the data matrices are permuted across the groups. Then we have distance D(X 1 σ(i) , X 2 σ(i) ) for each permutation. The fraction of permutations D(X 1 σ(i) , X 2 σ(i) ) that is larger than D(X 1 , X 2 ) gives the estimate for the p value.
Unfortunately, generating every possible permutation for whole images is still extremely time consuming even for a modest sample size. The number of permutations exponentially increases, and it is impractical to generate every possible permutation. In the permutation test, only a small fraction of possible permutations are generated, and the statistical significance is computed approximately. In most studies, on the order of 1% of total permutations were often used, mainly due to the computational bottleneck of generating permutations (Thompson et al., 2001;Zalesky et al., 2010). In Zalesky et al. (2010), 5,000 permutations out of possible ( 27 12 ) = 17, 383, 860 permutations (2.9%) were used. In Thompson et al. (2001), 1 million permutations out of ( 40 20 ) possible permutations (0.07%) were generated using a super computer. In our study, we have 131 MZ and 77 DZ twins. The possible number of permutations is ( 208 77 ). This is a number so large, we cannot exactly represent it in computing systems such as MATLAB and R. Even the 1% of ( 208 77 ) is about 1.96 × 10 56 , which is still astronomically large and beyond the computing capability of the most computers. On the other hand, the proposed KS-distance method computes for all possible permutations combinatorially and completely bypasses the computational bottleneck. There is no computational cost involved in the KS-distance and the computation is done in a few seconds. Furthermore, the method computes p values exactly and it is not approximate.

KOLMOGOROV-SMIRNOV DISTANCE
Recently, the Kolmogorov-Smirnov (KS) distance has been successfully applied in quantifying Kolmogorov-Smirnov (KS) distance: A distance between the empirical distributions of two samples. the change of β 0 number over graph filtration as a way to quantify brain networks without thresholding (Chung et al., 2017a(Chung et al., , 2017b. The main advantage of the method is that it avoids using the computationally costly and time consuming permutation test for large-scale networks. In this paper, we show how to apply KS-distance in quantifying the change of the β 1 number over graph filtration as well. In this study, the square root of 1 correlation is used as edge weights. Given two networks X 1 = (V, w 1 ) and X 2 = (V, w 2 ), KS distances between X 1 and X 2 for Betti numbers β 0 and β j are defined as (Chung et al., 2013;Lee et al., 2017): where β j (X i ) is the j-th Betti number for binary network X i . The distance D KS can be discretely approximated using the finite number of filtrations: If we choose enough of q such that j are all the sorted edge weights, then D KS (X 1 , X 2 ) = D q (Chung et al., 2017b). This is possible since there are only up to p(p − 1)/2 number of unique edges in a graph with p nodes and the monotone function increases discretely but not continuously. In practice, j may be chosen uniformly or a divide-and-conquer strategy can be used to adaptively grid the filtration values. Then the probability distribution of D q can be computed exactly by combinatorial means.

Theorem 3
where A u,v satisfies A u,v = A u−1,v + A u,v−1 with the boundary condition A 0,v = A u,0 = 1 within band |u − v| < d and initial condition A 0,0 = 0 for u, v ≥ 1.
The proof is given in Chung et al. (2017b).

Example 1 P(D 3 ≥ 2) is computed sequentially as follows (Figure 2). We start with the bottom left corner A 0,0 = 0 and move right or up toward the upper corner
The probability is then P(D 3 ≥ 2) = 1 − 8/( 6 3 ) = 0.6. The computational complexity of the combinatorial inference is O(q log q) for sorting and O(q 2 ) for computing A q,q in the grid while the permutation test requires exponential run time.
When q is too large, it may not be possible to represent and compute ( 2q q ) in all the digits. For large q, use the asymptotic probability distribution D q given by Chung et al. (2017b): The p value of the test statistic under the null is then computed as where the observed value d o is the least integer greater than D q / 2q in the data.

COMPARISONS
Six network distances (L 1 , L 2 , L ∞ , GH and KS on β 0 and β 1 ) were compared in simulation studies. For the review of various brain network distances, refer to Chung et al. (2017a). We also used the popular Q-modularity function for community detection in graph theory (Girvan & Newman, 2002;Meunier et al., 2009;Newman et al., 2006). The difference in Q-modularity functions was used as the distance measure. The simulations below were independently performed 100 times. We used p = 20, 100, 500 nodes and n = 5 images in each group, which made it possible for permutations to be exactly ( 5+5 5 ) = 252 (Figure 3). The small number of permutations enables us to compare the performance of distances exactly. Through the simulations, σ = 0.1 was universally used as network variability.
The data vector x i at node i was simulated as identical and independently distributed multivariate normal across i, that is, x i ∼ N(0, I n ) with n by n identity matrix I n as the covariance matrix. This gives the correlation matrix C 1 = (c 1 ij ) = (corr(x i , x j )). The edge weights were given by 1 − c 1 ij . The data vector y i at node i that produced node dependency was simulated by adding additional dependency to x i through a hierarchical linear model or mixedeffect model (Pinehiro & Bates, 2002;Snijders, Spreen, & Zwaagstra, 1995). This is a standard Mixed-effect model: A model with both fixed and random effect terms. simulation technique for introducing dependency structures in random simulations. The hierarchical linear model enables us to explicitly model the data vector at each node and simulate the amount of dependency between nodes, providing detailed control over the topological structures in the correlation matrices. Data vector y i at node i will be simulated using x i as follows.
This introduces a topological structure of connectedness through statistical dependency. Although we did not try here, a far more complex dependency structure is also possible. In our simulation c = p/k = 10, 5, 4, 2 and k = p/c = 2, 4, 5, 10 are used (Figure 3). Subsequently, we have the correlation matrix C 2 = (c 2 ij ) = (corr(y i , y j )) and the subsequent edge weights 1 − c 2 ij .

No Network Difference
It was expected there was no network difference between networks generated using the same parameters and initial data vectors x i in the above model. For example, Figure 3 shows two simulated networks generated with the same parameters k = 4, 10. We compared networks with the same parameter k: 4 vs. 4, 5 vs. 5 and 10 vs. 10. It is expected we should not able to detect the network differences. The performance results were given in terms of the false positive error rate computed as the fraction of simulations that gave p value below 0.05 (Table 1). For all the distances except KS-distance, the permutation test was used. Since there were five samples in each group, the total number of permutations was ( 10 5 ) = 272, making the permutation test exact and the comparisons accurate. All the distances performed very well including Q-modularity. KS-distance was overly sensitive and was producing up to 7% false positives. However, for 0.05 level test, it is expected that there is 5% chance of producing false positives. Thus, KS-distance is producing only 2% above the expected error rate.
The p = 20 simutation might be too small a network to extract topologically distinct features that are used in topological distances. Thus, we increased the number of nodes to p = 100 (Table 2). All the network distances except KS-distances performed reasonably well. KS-distances seem to be overly sensitive to slight topological change in large topological structures that were present in k = 2, 4, 5 cases. As k increases, KS-distances seem to perform reasonably well.

Network Differences
We generated networks with parameter k = 2, 4, 5, 10 with p = 20 nodes simulation (Figure 3). Since topological structures were different, the distances are expected to differentiate the networks. The performance results were given in terms of the false negative error rate computed as the fraction of simulations that give p value above 0.05 (Table 1). All the distances including Q-modularity performed badly, although KS-distance performed the best. Since graph theory features are not explicitly designed to measure network distances, they do not usually perform well when there are large topological differences.
We increased the number of nodes to p = 100. All the network distances including Q-modularity were still performing badly except KS-distances (Table 2). KS-distance on the number of cycles seems to be the best network distance to use when there are network topology differences, although it has tendency to produce false positives when there is no difference.
In terms of computation, distance methods based on the permutation test took about 950 seconds (16 minutes) for 100 nodes, while the KS-like test procedure only took about 20 seconds in a computer. The results given in Tables 1-3 may slightly change if different random networks are generated. We also performed the simulation study on the 500 nodes to see the effect of increased network sizes (Table 3). The proposed KS-distance on both β 0 and β 1 are not necessarily performing well in the case of no network differences. Again the KS-distance is too sensitive and detecting minute network differences. On the other hand, in the case of actual network differences, the KS-distances are performing exceptionally well compared with other network differences.

Dataset and Image Preprocessing
We used the resting-state fMRI of 271 twin pairs from the Human Connectome Project (Van Essen, Ugurbil, Auerbach, Barch, Behrens, Bucholz, Chang, Chen, Corbetta, & Curtiss, 2012). Out of a total 271 twin pairs, we only used genetically confirmed 131 MZ twin pairs (age 29.3 ± 3.3 years, 56M/75F) and 77 same-sex DZ twin pairs (age 29.1 ± 3.5 years, 30M/47F) in this study. Since the discrepancy between self-reported and genotype-verified zygosity was fairly high at 13% of all the available data, 19 MZ and 19 DZ twin pairs that do not have genotyping were excluded. We additionally excluded 35 twin pairs with missing fMRI data.
Given fMRI time series at the i-th parcellation ζ i (t) at time t, we scaled it to fit to unit interval [0, 1]. Then subtracted its mean over time where ψ 0 (t) = 1, ψ l (t) = √ 2 cos(lπt) were cosine basis functions and c li were coefficients estimated in the least squares fashion. For our study, k = 119 was used such that fMRI were compressed into 10% of the original data size; k = 119 expansion increased the signal-tonoise ratio (SNR) as measured by the ratio of variabilities by 81% in average over all 116 brain regions and 416 subjects, that is, SNR = 1.81. The resulting real-valued Fourier coefficient vector c i = (c 0i , c 1i , · · · , c ki ) was then used to represent the fMRI in each parcellation as 120 features in the spectral domain.

Twin Correlations
The subject level connectivity matrix C = (c ij ) was computed by correlating 120 features in the spectral domain. Between iand j-th parcellations, the connectivity was measured by correlating c i and c j over 120 features, that is, c ij = corr(c i , c j ). From the individual correlation matrices C, we computed pairwise twin correlations in each group at the edge level. The resulting group level twin correlations matrices C MZ = (c MZ ij ) and C DZ = (c DZ ij ) are nonsymmetric cross-correlation matrices. Since there is no preference in the order of twins, we symmetrize them by C MZ ← (C MZ + C MZ )/2 and C DZ ← (C DZ + C DZ )/2.
Then we are interested in knowing the extent of the genetic influence on resting-state functional brain network and its statistical significance. For this, we use the widely used ACE genetic model (Falconer & Mackay, 1995) that mainly uses heritability index (HI), which determines Heritability index: A number between 0 and 1 that measures the amount of genetic contribution.
the amount of variation (in terms of percentage) due to genetic influence in a population. HI is often estimated using Falconer's formula (Falconer & Mackay, 1995) as a baseline. MZ twins share 100% of genes, whereas DZ twins share 50% of genes. Thus, the additive genetic factor A, the common environmental factor C for each twin type are related as where corr(c MZ ij ) and corr(c DZ ij ) are the pairwise correlation within MZ and same-sex DZ twins at edge between i and j. Solving Equation 5 and Equation 6, we obtain the additive genetic factor, that is, HI given by HI = 2(C MZ − C DZ ).
The network differences between MZ and DZ twins are considered as mainly contributed to heritability and can be used to determine the statistical significance of HI (Chung et al., 2017. The KS-distance was computed by taking 1 − C MZ and 1 − C DZ as edge weights. In most brain imaging studies, 5,000-1,000,000 permutations are often used, which puts the total number of generated permutations to usually less than 0.01 to 1% of all possible permutations. In Zalesky et al. (2010), 5,000 permutations are out of a possible ( 27 12 ) = 17, 383, 860 permutations (2.9%) used. In Thompson et al. (2001), for instance, 1 million permutations out of ( 40 20 ) possible permutations (0.07%) were generated using a super computer. In Lee et al. (2017), 5,000 permutations out of a possible ( 33 10 ) = 92, 561, 040 permutations (0.005%) were used. Since we have 131 MZ and 77 DZ pairs, the total number of possible permutation is ( 271 131 ), which is larger than 10 80 . Even if we generate only 0.01% of 10 80 of all possible permutations, 10 76 permutations are still too large for most desktop computers. Thus, we choose the KS-distance for measuring the network distance. Although the probability distribution of the KS-distance is actually based on the permutation test but the probability is computed combinatorially, bypassing the need for resampling. KS-distance in our study only took a few seconds to compute the p value.

Results
We used β 0 and β 1 in computing KS-distances. Let φ • C MZ = (φ(c MZ ij )) and φ • C DZ = (φ(c DZ ij )) for some monotone function φ. Then KS-distance between C MZ and C DZ is equivalent to KS-distance between 1 − C MZ and 1 − C DZ as well as between φ • (1 − C MZ ) and φ • (1 − C DZ ). Thus, we simply built filtrations over C MZ and C DZ and computed KS-distance without using the square-root of 1 -correlation. We used 101 filtration values between 0 and 1 at 0.01 increment (Figure 4). This gives a reasonably accurate estimate of the maximum gap in the β i -plots between the twins ( Figure 5). For β 0 -plots, the maximum gap is 82, which gives the p value smaller than 10 −24 . For β 1 -plots, the maximum gap is 3,647, which gives the p value smaller than 10 −32 . At the same correlation value, MZ twins are more connected than DZ twins. Also MZ twins have more cycles than DZ twins. Such huge topological differences are contributed to heritability. Figure 6, which displays the HI index thresholded at 100% heritability, shows MZ twins far more similar compared with DZ twins in many connections, suggesting that genes influence the development of these connections. The most heritable connections include the left frontal gyrus, left and right middle frontal gyri, left superior frontal gyrus, left parahippocampal gyrus, left and right thalami, left and right caudate, and nuclei among many other regions. Most regions overlap with highly heritable regions observed in other twins brain-imaging studies (Fan, Fossella, Sommer, Wu, & Posner, 2003;Glahn et al., 2010;Gritsenko et al., 2018). Moreover, the findings here are somewhat consistent with a previous study on diffusion tensor imaging on twins from our group (Chung, Luo, Adluru, Alexander, Richard, & Goldsmith, 2018a;Chung et al., 2018b), showing that many regions of both resting-state functional and structural connections are heritable at the same time. The left and right caudate nuclei are identified as the most heritable hub nodes in our study.
The MATLAB codes for the simulation study as well as the connectivity matrices C MZ and C DZ used in generating results are given at http://www.stat.wisc.edu/~mchung/TDA.

The Limitation of KS-distances
Currently KS-distance is applied to Betti numbers β 0 and β 1 separately. It may be possible to construct a new topological distance that uses the combination of both β 0 and β 1 and come up with topologically more sensitive distances. One possible approach is to use the convex combination αD 0 KS + (1 − α)D 1 KS , where D i KS is KS-distance for β i and 0 ≤ α ≤ 1. This is beyond the scope of this paper and left as a future study.

Other Network Distances
The network distances used in this study are not just any other distances but metrics. Since there are almost infinitely many possible similarity measures and distances we can use in networks, the performance of the chosen distance is important in discrimination tasks, which we have shown in simulation studies. The determination of the optimal distance is related to metric learning, an area of supervised machine learning in which the goal is to learn from data an optimal similarity function that measures how similar two objects are (Ktena, Parisot, Ferrante, Rajchl, Lee, Glocker, & Rueckert, 2018;Lowe, 1995). This is left as a future study.

Computational Issues
The total number of permutations in permuting two groups of size q each is ( 2q . Even for small q = 10, more than tens of thousands of permutations are needed for the accurate approximation of the p value. The main advantage of KS-distance over all other distance measures is that it avoids numerically performing the permutation test and avoids generating tens of thousands of permutations. Although the probability distribution of the KS-distance is actually based on the permutation test, the probability is computed combinatorially. We believe that it is possible to develop similar theoretical results for other distance measures and come up with a method for avoiding a resampling-based method for statistical inference.