About
DINA Network Evaluations
7 min
network distribution and modularity analyses biological networks are expected to show modularity, where genes are clustered into communities of closely related functions to evaluate the modularity of networks from mavatar discovery, the genes were first clustered based on the top 100 interactions and with the inverted t values as weights, using the igraph hclust approach and cutreehybrid with minimum cluster size of 100 and deepsplit of 3 for the lung – general network, this resulted in 58 modules with mean (range) of 456 9 (38, 2,414) genes per module (table 1, figure 1) the module construction for three different networks, namely lung – general, prostate – general, and brain – general, generated between 48 and 58 modules per network including a mean (range) of 461 5 (1, 3,821) genes per module table 1 distribution of number of genes identified in the different modules from the lung, prostate, and brain general networks module size lung general prostate general brain general min 38 1 2 1st quantile 184 0 148 2 180 0 median 322 0 238 5 285 0 mean 456 9 428 3 499 2 3rd quantile 594 5 414 2 505 0 max 2414 3508 3821 figure 1 module dendrogram for a) lung, b) prostate, and c) brain general networks a b c for each network analyzed, the degree distribution followed a heavy tailed pattern with a power law like trend in the mid range (degree 5–50) at higher degrees however, the curve flattens, which could be a result of data sparsity (figure 2) figure 2 degree distribution plots for the a) lung, b) prostate, and c) brain general networks a b c the modularity score for each network was calculated and compared with modularity scores of 1,000 permutations of randomly connected networks, where the gene module associations had been shuffled for each of the networks, these results showed that the network modularity score was significantly higher (p < 0 001) than what would have been expected by chance (table 2) table 2 for each of the networks this table shows; real network modularity score, mean random modularity score (based on 1,000 permutations of networks where the gene module association had been randomly shuffled), and the empirical p value testing whether random modularity >= real modularity network modularity score mean random modularity score empirical p value lung general 0 497 5 42e 05 <0 001 prostate general 0 386 5 22e 05 <0 001 brain general 0 409 6 32e 05 <0 001 to ensure that these findings are not specific to the chosen clustering approach, further evaluation of the modularity was done based on louvain community detection, which showed a continuously stronger modularity for each real networks as compared to 100 permutations of random networks of equal size (number of nodes ad edges) the modularity q scores mean (range) over the networks were 0 714 (0 678, 0 773) and the z score mean (range) from comparing to the random networks were 415 1 (355 2, 508 0) the random networks in this case had been produced based on random connection of equal number of edges and nodes as each respective real network if the random networks instead were based on random rewiring of the edges, while preserving the original graph's degree distribution, the respective z score mean (range) was 254 1 (143 6, 436 8) these results support an overall high modularity of the networks in mavatar discovery furthermore, based on comparisons of the clustering coefficient and the weighted average path length of each network to an erdős–rényi model with same degree on average, small world properties of the networks were supported the small world sigma values were 2,339 3, 814 7, and 655 0 for the lung, prostate, and brain general networks respectively network comparability to known interactions to ensure that the networks represent known functions and interactions, we compared the dina identified edges with known interactions within the string database (version 11 5) ( docid 7axsa9sh8qlrhmtkp8imm ) the known gene interactions in homo sapiens with a minimum interaction score of 400 were identified, including the gene background of the dina network being analyzed fisher’s exact test was then performed to test whether the gene interactions in string were significantly enriched within the mavatar discovery dina network a significant enrichment of string interactions was found in each of the analyzed networks (p < 0 001) references d szklarczyk, r kirsch, m koutrouli, k nastou, f mehryary, r hachilif, a l gable, t fang, n t doncheva, s pyysalo, p bork, l j jensen, c von mering, the string database in 2023 protein protein association networks and functional enrichment analyses for any sequenced genome of interest nucleic acids res 51, d638–d646 (2023) ( https //doi org/doi 10 18129/b9 bioc stringdb )