About
Mavatar Discovery Features
5 min
expression heatmap the expression heatmaps are based on the normalized expression matrix of one dataset representative of user selected tissue and condition samples were quantile normalized and genes were z score normalized samples were clustered using euclidean distance and average linkage clustering whereas genes were clustered using biweight midcorrelation and average linkage clustering only genes in the drawn network that were also present in the dataset used are visualized in the heatmap, and only those genes inform the sample clustering umap by applying umap dimensionality reduction, high dimensional single cell expression profiles are projected into a low dimensional space that preserves both local and global transcriptional relationships this approach facilitates the visualization of cellular heterogeneity and highlights context specific gene expression within the same biological context at the cell type resolution the umap visualizations were generated from a representative single cell dataset corresponding to the same tissue type as that used for the displayed network dina network similarity to assess network similarity, we use deltacon ( docid 0zknxsqa6r3cuqdntt6w ), a graph comparison metric that quantifies structural similarity based on node affinity and information flow deltacon is well suited for comparing weighted, undirected networks and provides interpretable similarity scores that reflect both local and global structure the deltacon method provides a robust and reliable approach for analyzing biological networks it leverages network affinities, meaning it evaluates how information flows through the whole network one of its key strengths is that it accounts not only for the presence of edges, but also for their relative importance and the overall network topology, resulting in a more comprehensive and insightful analysis compared to simpler or less nuanced methods deltacon distances and similarities are sensitive to the size and connectivity of the networks being analyzed to address this issue, we constructed weighted, tissue aware, and balanced null distributions of deltacon scores that allow us to compute p values, indicating whether the observed similarity between networks is higher than expected by chance compared to a biologically relevant background these null distributions are generated from real subnetworks of matched size and connectivity, sampled from each corresponding tissue graph, and excluding comparisons of networks with high percentage of shared samples this approach ensures that variations in network size and connectivity are properly accounted for, leading to a fair and biologically meaningful comparison p value stability was evaluated across replicates with leave one out method for all networks in a tissue accordingly, 10 observed deltacon similarity values were obtained within 16 randomly selected size and connectivity combinations (n=160 observations) overall, p values were stable across replicates showing low (<0 1) coefficients of variation conditions expression chart to evaluate potential disease associations of genes included in the network, each gene’s respective rank normalized and scaled expression levels were visualized using boxplots for each gene, expression distributions were compared between the disease conditions represented by samples used in the displayed network such a box was drawn if enough (n>=30) samples were available, and only for the “general” networks, i e tissue specific networks including all collected data for that tissue this approach enables an assessment of expression specificity, providing insights into whether a gene’s expression is predominantly enriched in one of the diseases or more broadly distributed across multiple biological contexts gene set functional analysis gene ontology (go) functional enrichment ( docid 0zknxsqa6r3cuqdntt6w ) was performed to identify overrepresented gene sets, as defined by the mavatar discovery user’s query the enrichment p value was calculated using the hypergeometric distribution function adjusting for multiple testing with benjamini hochberg (bh) to decrease the false discovery rate (fdr) the gene background was defined as the intersection of the genes included in the network and in the go database to account for the directed acylic graph (dag) nature of go terms, an “elim” approach was implemented, as previously described by alexa et al , bioinformatics, 2006 ( docid 0zknxsqa6r3cuqdntt6w ) top 10 significantly enriched go terms for each biological processes (bp), molecular function (mf) and cellular components (cc) categories are shown, with respective adjusted p value and gene counts references d koutra, j t vogelsteiny, c faloutsos, deltacon a principled massive graph similarity function proceedings of the 2013 siam international conference on data mining, sdm 2013 , 1304–4657 (2013) ( https //dl acm org/doi/10 1145/2824443 ) the gene ontology consortium, the gene ontology knowledgebase in 2023 genetics 224 (2023), doi 10 1093/genetics/iyad031 ( https //geneontology org/ ) a alexa, j rahnenführer, t lengauer, improved scoring of functional groups from gene expression data by decorrelating go graph structure bioinformatics 22, 1600–1607 (2006) ( https //doi org/10 1093/bioinformatics/btl140 )