Bladder Cancer Specific Pathway Interaction Networks

With the development of high-throughput technologies, monitoring biological systems comprehensively has became feasible and affordable. However, the transition from high-throughput data to the underlying biology of various phenotypes remains challenging. Pathway analysis identifies biological processes that are associated with a particular phenotype , which provides insights into the underlying biological mechanisms. Therefore, pathway analysis has became a popular tool for analyzing high-throughput data. Most existing pathway analysis methods are based on a simple assumption that pathways act in isolation whereas they cooperate with each other in a complex manner. In this study, we focus on pathway interactions that are associated with bladder cancer risk. We identify disease-specific pathway-pathway interactions based on SNP-SNP interactions and gene-gene co-expression relationships. By analyzing the structure of pathway interaction networks, we highlight the " central " pathways that should be further studied.


Introduction
Techniques such as high-throughput sequencing and gene/protein profiling have transformed biological research by enabling comprehensive monitoring of a biological system (Khatri et al., 2012).Conventional analyses of highthroughput data usually test the association of individual genes/proteins with a given phenotype.Although successfully applied in many studies, this approach fails to provide insights into the underlying biological mechanisms of the phenotype being studied.As an alternative approach, pathway analysis highlights the risk associated biological processes.The mechanisms of pathway associations can be used for developing strategies to diagnose, treat and prevent complex diseases (Ramanan et al., 2012), which makes high-throughput datasets more often viewed as a foundation to discover associated pathways (Hirschhorn, 2009).
In pathway analyses, gene sets corresponding to biological pathways are tested for significant associations with a phenotype.Multiple methods have been developed and among them pathway enrichment approach is the most popular one.Most pathway-enrichment-approach studies follow two categories: the threshold-based framework and the rank-based framework.Threshold-based approaches usually statistically evaluate the fraction of genes in a particular pathway among all the significant markers (Boyle et al., 2004).Rank-based approaches rank all markers based on their significances and then look for pathways that have better rankings than the overall distribution (Subramanian et al., 2005).Although successfully applied in many studies, enrichment approaches depend on single-marker statistics and treat each gene independently.In reality, biological systems are driven by complex biomolecular interactions instead of individual genes (Schadt et al., 2009).Thus, methods that take biomolecular interactions into account remain needed.
Most pathway analysis approaches assume that each pathway is independent of the others, which could be problematic (Khatri et al., 2012).Pathways do not operate in isolation.Instead, they cooperate and work together as a unit.A recent study of yeast showed that rewiring of genetic interactions in response to DNA damage are more likely to occur among pairs of genes that belong to two different biological processes (Bandyopadhyay et al., 2010).Although whether interactions are more likely to occur within the same pathway or across different pathways in human is unknown, the significance of pathway interactions is in-negligible.There are several possible ways two pathways can interact: sharing components (Lu et al., 2007), components physically interacting with counterparts from the other pathway (Lu et al., 2007;Guo and Wang, 2009), components relating with components of the other pathway via transcription (Guo and Wang, 2009), etc.The different interacting ways can be reflected in different manners including pathway overlapping, direct or indirect protein protein interactions (PPIs), co-regulation of genes etc.So to fully understand pathway interactions, integration of knowledges from different levels is required.teins in the protein-protein-interaction (PPI) network (Kelley and Ideker, 2005).Meanwhile, overlapping in genes, proteins or metabolite contents between pathways has also been used to gain insights in possible relationships between pathways (Isserlin et al., 2010).Lastly, a function based approach is to find possible pathway crosstalks by looking at protein interactions between pathways (Li et al., 2008).Most previous methods highlight pathway crosstalks under general conditions.The crosstalks identified are not associated with any particular phenotype.In this study, we make use of the disease case-control data and search for the disease specific pathway interactions only, which distinguish our work from previous ones the most.
Meanwhile, network science has emerged as a useful tool for modeling biological interactions and dependencies (Ideker and Sharan, 2008).Compared to analysis of groups of distinct objects, networks provide more information on the relatedness and interconnectivity of them, which makes network a suitable framework for investigating pathway interactions.
In this study, we focus on pathway interactions that are associated with bladder cancer risk.We infer pathway interaction networks from two different levels: SNP-SNP interactions and gene-gene co-expression relationships.By analyzing the structure of pathway interaction networks, we identify essential pathways that hold great potentials for future studies.

SNP interaction network
A recent study characterized the space of pairwise interactions in a population-based bladder cancer association study (Hu et al., 2011).In their SNP interaction network, each vertex corresponds to a single nucleotide polymorphism (SNP).An edge linking a pair of vertices corresponds to an interaction between two SNPs.Weights assigned to each SNP and each pair of SNPs quantify how much of the disease status the corresponding SNP and SNP pair can explain.The significance of the SNP interaction network is not limited to single main or interaction effects.Instead, it describes the overall significance of the global interaction structure, which makes this approach systematic.To identify bladder cancer specific pathway interactions, we adopt the SNP interaction network from this previous study and the network is shown in Figure 1.

Dataset
The microarray dataset used in this study is publicly available from Gene Expression Omnibus (GEO) website (dataset ID: GDS1479) and more details about this dataset can be obtained at Dyrskjot et al. (2003).Briefly, the original study profile about 22,000 genes to analyze bladder biopsies of superficial transitional cell carcinomas with or without surrounding carcinoma in situ (CIS) lesions and muscle invasive carcinomas (mTCC).To match disease stages of individuals in the bladder cancer SNP dataset, we only use two out of the five groups in the original study.One group contains 15 tumor biopsies from superficial transitional cell carcinoma (sTCC) without surrounding CIS, which serves as cases.The other group contains 9 biopsies of normal bladder mucosa from patients without a bladder cancer history, which serves as controls.The cRNA from different samples are hybridized to Affymetrix U133A GeneChips.After data processing as described by Dyrskjot et al. (2003), there are about 22,000 genes in the dataset.

Differential co-expression network
Genes with similar expression patterns may form complexes, pathways, or participate in regulatory and signaling circuits (Eisen et al., 1998;Ideker et al., 2002;Huang et al., 2007).Therefore, gene co-expression networks, which describe the pairwise relations among gene expression profiles, have become a popular tool for microarray analysis.A gene co-expression network is an undirected graph, where the vertices correspond to genes, and edges between genes represent significant co-expression relationships (Stuart et al., 2003).The weight of an edge can be computed using different correlation measures and an edge is included in the network if its weight pass a certain pre-specified threshold.
Gene co-expression networks have been successfully applied in many studies mostly to identify functional gene modules (Stuart et al., 2003;Presson et al., 2008;Weston et al., 2008).However, to study a particular disease, it is the difference between cases and controls that provides the most information about the underlying mechanism.In other words, rather than asking 'what parts of the system are the most abundant or dominant', we should ask 'What parts of the system are most distinctive between different conditions' (Ideker and Krogan, 2012).Therefore, to gain insights into the transition from healthy individuals to bladder cancer patients, we adapt the framework of the differential network previously proposed (Bandyopadhyay et al., 2010).Instead of constructing two static co-expression networks, one for cases and one for controls, we build differential co-expression networks to describe the changes of pairwise relations among genes from controls to cases.In this way, co-expressions present in both conditions are downplayed or removed from the differential network.The coexpressions that reflect the changes from controls to cases are distinguished from those that support the housekeeping functions.
Specifically, we filter the probes in the microarray dataset and only 308 of them, which also exist in the SNP dataset, are considered in this section.We compute Spearman's correlation for all 308 differential co-expression network is constructed to include only edges with significant negative weights and the positive differential co-expression network only include edges with significant positive weights.In other words, the negative differential co-expression network describes the gene pairs which lose correlations from controls to cases, while the positive differential co-expression network describes those which gain correlations.

Pathway interaction network
We construct pathway interaction network G 1 from SNP interaction network, G − 2 from negative differential coexpression network and G + 2 from positive differential coexpression network.In all pathway interaction networks, a vertex represents a pathway.Two pathways are connected if there are more edges than by chance between the two corresponding pathways in the underlying biomarker network.
For example, to construct pathway interaction network G 1 , we use permutation test to determine whether edges are more likely to occur between two pathways than by chance in the SNP interaction network.First, we annotate all the vertices in the SNP interaction network with their functions using canonical pathway annotations from MsigDB collection 2. All SNPs in this dataset are within coding region, so a SNP-to-gene-to-pathway mapping is straight forward.To investigate the enrichment of pathway-pathway interactions, we count the frequency of edges in the SNP interaction network between two particular pathways.Note that pathways with more SNPs might display more edges in the SNP interaction network.To control the bias introduced by pathway sizes, we use permutation test to generate a null distribution.Specifically, we randomly shuffle the SNP-pathway(s) annotations for 1,000 times and assess the frequency of edges between two particular pathways in the original SNP interaction network at each time.The P -value of a pathwaypathway interaction enrichment is computed as the fraction of corresponding edge frequencies on permuted data which are no smaller than that of the real data.Similarly, pathway interaction networks G − 2 and G + 2 are generated based on the negative and positive differential co-expression networks accordingly.In all three cases, a pathway pair is included in the pathway interaction network if the P -value of the pathwaypathway interaction enrichment is smaller than 0.001.

Essential vertices identification
To identify key vertices in the networks, we use bottleneck vertices to survey independent regulation as it has been shown that bottleneck vertices are regulated in a conditiondependent manner in biological networks (Yu et al., 2007).We define bottleneck vertices as vertices with the 10 highest betweenness centrality scores.The betweenness centrality score is based on the number of shortest paths that cross a given vertex, and thus reflects how embedded (i.e.central) a vertex is in the network (Freeman, 1977).
Hubs, highly connected vertices in other words, play an integral role in maintaining network integrity and in mass information transfer.Therefore, we also identify hubs of the networks.In this study, we define hub vertices as vertices with the 10 highest degree scores.
Both hub and bottleneck vertices have been associated with functional essentiality (Yu et al., 2007;Zotenko et al., 2008).We are interested in their ability to influence the propagation of signals across the network and their importance to maintain the integrity of the network.Particularly, we define hub-bottleneck vertices as the overlapping vertices between hub vertices and bottleneck vertices.

Pathway interaction network derived from SNP interaction network
We study whether SNP-SNP interactions are more likely to occur between certain pathway pairs or not.As described previously, we construct pathway interaction network G 1 by only including pathway pairs which have more edges between them in the SNP interaction network than by chance (P < 0.001).Figure 2 shows the structure of pathway interaction network G 1 .G 1 has 386 vertices and 635 edges.There are two connected components in the network and the vertices cluster into several communities.Moreover, there is no self-loop in G 1 , which indicates that SNP-SNP interactions mostly happen between different pathways instead of within the same pathway .
A heavy-tail degree distribution of G 1 indicates the existence of hub vertices.The hub-bottleneck vertices (pathways) are reported in Table 1.Among all the hub vertices, KEGG FOCAL ADHESION is connected to 269 neighbors and displays the highest betweenness centrality.

Differential co-expression network
To study the bladder cancer specific co-expression patterns, we construct differential co-expression networks from the microarray dataset.To match the SNP dataset, only 308 transcripts that also exist in the SNP dataset are used in this section.The changes of Spearman correlation between all 308 2 = 47, 278 pairs of transcripts from controls to cases are shown in Figure 3.The differential co-expression correlation, (C case -C control ), follows a normal distribution with mean ± stdv = 0.0006 ± 0.51214.The 95% confidence interval is [-1.003, 1.004].Transcript pairs with C case -C control < -1.003 are included in the negative differential co-expression network, which means that there are coexpressions in controls but not in cases.Similarly, transcript pairs with C case -C control > 1.004 are included in the positive differential co-expression network, which indicates that there are co-expressions in cases but not in controls.
The negative differential co-expression network has 297 vertices and 1,114 edges.Among two connected components, the largest one has 293 vertices.Similarly, the positive differential co-expression network possesses 287 vertices and 1,114 edges.There are no isolated islands in the network.Both differential co-expression networks display a heavy-tail degree distribution, which indicates the existence of hub genes.
To identify the key players for maintaining network integrity and signal propagation across the network, we identify the hub-bottleneck vertices in both negative and positive differential co-expression networks (Table 2).Interestingly, gene GATA3, ERBB2 and HFE are identified in both differential co-expression networks.

Pathway interaction network derived from differential co-expression network
We investigate whether edges in the differential coexpression networks are more likely to occur between particular pathway pairs than by chance.As described previously, we construct pathway interaction network G − 2 by only including pathway pairs which have more edges between them in the negative differential network than by chance (P < 0.001).Similarly, pathway interaction network G + 2 is obtained from the positive differential co-expression network.
As shown in Figure 4, pathway interaction network G − 2 has 78 vertices and 100 edges.There are three connected components and the entire network divide into several communities.The hub-bottleneck vertices (pathways) are reported in Table 3.Meanwhile, pathway interaction network G + 2 has 70 vertices and 83 edges.The network has six connected components (Figure 5).Surprisingly, no common edge is shared by G − 2 and G + 2 .This means that when comparing cases with controls, the gaining of co-expression and the loss of co-expression happen between different pathway pairs.
Although G − 2 and G + 2 do not share any common edge, the essential vertices in the two networks overlap with differences (Table 3).Two pathways, BIOCARTA GATA3 PATHWAY and KEGG MELANOGENESIS, are identified as hubbottleneck vertices in both networks.
It is also an interesting thing that there is no selfloop in G − 2 (Figure 4) and only two self-loops are observed in G + 2 (Figure 5).In other words, no pathway possesses more edges within itself than by chance in the negative differential co-expression network.Only two pathways, REACTOME GPCR LIGAND BINDING and SA CASPASE CASCADE, have more edges within themselves than by chance in the positive differential coexpression network.This indicates that differential coexpressions seem to happen mostly between different pathways rather than within the same pathway.

Discussion
We have identified pathway-pathway interactions that are associated with bladder cancer risk.Specifically, we investigate pathway interaction at two levels: genetic inter-actions and co-expression relations.We construct one pathway interaction network from the SNP interaction network and two from the differential co-expression networks.There are some limitations to our approach that are worth highlighting.First, there are multiple other ways pathways can interact with each other, i.e. through sharing common components.Examining pathway relationships more throughly at different levels would be helpful.Second, co-expression relationships are transferable.In other words, if two genes are both correlated with a third gene, a correlation will be observed between those two genes.Using partial correlation instead of correlation itself might be helpful to identify direct gene-gene co-expression relationships (de la Fuente et al., 2004).
Despite these limitations, our analyses have helped to identify the pathway interactions that are associated with bladder cancer risk and highlight the importance of pathway interactions.In our study, we observe several interesting aspects and discuss them as follows.
First, we characterize the differential co-expression relationships between cases and controls.The hub-bottleneck genes are identified in the differential co-expression networks.Three genes, GATA3, ERBB2 and HFE, are recognized in both positive and negative differential co-expression networks.Gene GATA3 encodes trans-acting T-cell-specific transcription factor GATA-3, which is an important regulator of T-cell development and plays an important role in endothelial dysplasia.Previously, GATA3 has been suggested as a marker of urothelial differentiation (Higgins et al., 2007;Miyamoto et al., 2012).The fact that GATA3 appears as a hub-bottleneck gene in both positive and negative differential co-expression networks (Table 2) suggests that GATA3 could be actively turning on or off its neighbor genes' expression, which might be highly associated with bladder cancer risk.Whether GATA3 is causing the expression level changes of its neighbor genes and how its neighbor genes contribute to the disease risk should be further studied.Gene ERBB2 encodes a member of the epidermal growth factor (EGF) receptor family of receptor tyrosine kinases.Tumorspecific overexpression of ERBB receptors or their isoforms has been reported in a previous study (Junttila et al., 2003).Whether and how ERBB2 influences its neighbor genes' expressions in the context of bladder cancer risk should be explored.Gene HFE encodes a membrane protein HLAh, which binds to transferrin receptor (TFR) and reduces its affinity for iron-loaded transferrin (Feder et al., 1998).Whether and how HFE is associated with bladder cancer risk remains unknown.The central role of HFE in the differential co-expression networks might reflect its association with bladder cancer risk.
Second, traditional pathway analyses focus on identifying pathways that are enriched for significant biomarkers.We took a different route and look for pathway pairs that are enriched for between-pathway SNP-SNP interac-tions or between-pathway differential co-expressions.Consequently, we identify pathways that are "central" in the whole system instead of pathways that are individually associated with the disease risk.We run Gene Set Enrichment Analysis (Subramanian et al., 2005) on both the SNP dataset and the microarray dataset.Most of the pathways reported in Table 1 and 3 are not enriched in the significant biomarkers.However, existing knowledge indicates that some of them could be highly involved in the underlying mechanisms of bladder cancer.For instance, two telomere related pathways are identified as hub-bottleneck vertices in G 1 (Table 1).It is well known that telomere dysfunction or loss can cause sister-chromatid fusions that is associated with oncogene amplification (Campbell et al., 2010;Murnane, 2012).
Third, it is interesting that all networks constructed in this study possess a heavy-tail degree distribution.In other words, most vertices in the networks have only few neighbors whereas a few vertices have many neighbors.This structure makes the network robust to random removal of vertices and vigorous to external perturbations (Barabasi and Bonabeau, 2003;Wang and Chen, 2002).Scale-free structures have been observed in various biological networks (Jeong et al., 2001;Li et al., 2004).It is interesting that we observe a similar structure at the pathway level.
Fourth, we find that the pathway interaction network G 1 does not share any common edge with G + 2 or G + 2 .The hub-bottleneck vertices in G 1 are not as "central" in G − 2 or G + 2 .Although genetic interactions are very different with co-expression patterns, this result is still surprising.This means that the interactions across pathways are of distinguishing patterns at different levels.
Last, we find that both SNP-SNP interactions and differential co-expressions mostly happen between different pathways rather than within the same pathway.This result suggests that SNP interactions and co-expression patterns in known pathways stay stable across cases and controls.It is the SNP interactions and co-expression patterns between these pathways that are reprogrammed across different disease conditions.Previous studies in yeast have shown that static genetic interactions are enriched within known pathways (Kelley and Ideker, 2005), whereas differential genetic interactions are much more likely to occur among pairs of genes connecting two different pathways than among pairs of genes within the same pathway (Bandyopadhyay et al., 2010).Although co-expressions do not necessarily indicate genetic interactions, it is still encouraging to observe similar results in human SNP and microarray data in our study.Also, this observation further emphasizes the important role of pathway interactions in disease association studies, which highlights the significance of our work.
In summary, we construct bladder cancer specific pathway interaction networks from both SNP-SNP interactions and gene-gene co-expression patterns.Our study highlights key pathway interactions that should be further investigated and emphasizes the importance of disease-specific pathway interactions.
Figure 1: SNP interaction network.A vertex represents a SNP and an edge represents the interaction between the two vertices it connects.The weight of an edge reflects the pairwise interaction strength.There are 319 vertices and 255 edges in the SNP interaction network.The 319 SNPs cover 185 genes.The width of an edge and the size of a vertex are proportional to their weights.More details about this network are available at Hu et al. (2011) ECAL -General Track ECAL 2013

Figure 2 :
Figure2: Pathway interaction network G 1 revealed by SNP interaction network.A vertex represents a pathway and an edge exists if the number of SNP interactions between the two corresponding pathways is significantly larger than by chance ( P < 0.001).There are 386 vertices and 635 edges in the network.The largest connected components has 384 vertices.The network shows a heavy tail degree distribution.

Figure 3 :
Figure3: Frequency distribution of pairwise differential coexpression, C case -C control .Spearman correlation for all 47,278 pairs of genes are calculated for cases (C case ) and controls (C control ) separately and the difference (C case -C control ) are presented.C case -C control ranges from -1.652 to 1.711 and followes a normal distribution.Red lines represent the 95% confidence interval (C case -C control = -1.003and 1.004 respectively).There are 1,011 pairs of transcrpits on the left 2.5% tail and 1,114 on the right 2.5% tail.

Figure 4 :
Figure 4: Pathway interaction network G − 2 revealed by negative differential co-expression network.A vertex represents a pathway and an edge exists if differential co-expression are more likely to occur between the two pathways it connects than by chance in the negative differential co-expression network.The network G − 2 has 78 vertices and 100 edges.

Figure 5 :
Figure 5: Pathway interaction network G + 2 revealed by positive differential co-expression network.A vertex represents a pathway.An edge indicates that there are more edges between the two pathways it connects than by chance in the positive differential co-expression network.There are 70 vertices and 83 edges in the network.The vertices fall into six connected components.

Table 1 :
Hub-bottleneck vertices (pathways)in pathway interaction network G 1 derived from SNP interaction network.Pathways are ranked in descending order of betweenness centrality.