Interpretation and biology
7 min
why networks? / what are networks? / why are networks important for my research? differentially expressed genes are often not sufficient to fully understand disease mechanisms looking at genes in isolation misses an important part of the biological context—namely how genes work together in reality, genes rarely act alone they function as part of coordinated groups or pathways when genes are co expressed (i e , show similar activity patterns), it often indicates that they are involved in the same biological process by placing genes into a network context, we can identify these coordinated patterns and better understand the underlying disease biology network based approaches also help improve reliability genes that appear as isolated signals—without connections to other affected genes—are more likely to represent noise or false positives rather than true biological effects in contrast, groups of interconnected genes provide stronger evidence of meaningful biological changes overall, while traditional gene level analysis identifies which genes go up or down, network based approaches reveal how genes are connected, co regulated, and function together, providing a more complete and biologically relevant understanding of disease biological function emerges from connected systems no gene acts alone a single gene’s effect depends on the context of what it is connected to gene co expression networks capture this genes that are highly co expressed across samples tend to participate in the same biological processes or complexes, where modules of tightly co expressed genes map onto coherent biological functions this concept has been widely accepted by scientific community ( 1 3 docid\ hc4lttci5muxhhw6cse2s ) in a simple expression list, a gene that is differentially expressed looks the same whether it's a primary driver or a secondary consequence in a network, you can distinguish between them hub genes, or those with many connections, are more likely to be functional regulators, and their perturbation will propagate through downstream partners a well cited example is the identification of il 6 as a network hub in autoimmune disease it's not just elevated, it's a convergence point for multiple inflammatory signaling pathways, which is precisely why tocilizumab (an anti il 6r antibody) is effective across sle, ra, and covid 19 cytokine storm ( 4 docid\ hc4lttci5muxhhw6cse2s ) network centrality also provides some insights into why some drug targets fail; targeting a peripheral node has little system wide effect, while targeting a hub gene would disrupt the whole cascade in mavatar discovery, two genes are considered connected if they are significantly co expressed in the context the user is exploring, meaning their expression levels show mutual dependence across different experimental settings the weighted edges in the graph are undirected and represent the normalized t value of the correlation across all the studies included in the analysis biologically, this connection suggests that these genes have similar expression patterns and are likely involved in similar or related functional roles or are associated with specific disease phenotypes network analysis is only as good as the data underpinning it, which is exactly why a large scale, well curated, integrated transcriptomics resource changes what's possible a network built from 50 samples is noisy and unreliable; one built from thousands of samples across multiple conditions and cohorts lets hub genes and modules emerge with statistical confidence what do the neighbors of my gene of interest represent? neighbors represent the genes with the highest mutual dependence and most synchronized expression patterns across the experimental settings being explored moreover, these have several biological implications functional modularity and relatedness direct neighbors are highly likely to be involved in similar or related functional roles pathways direct neighbors represent the immediate “first step” for signals or perturbations to propagate through the system if your gene of interest is causal for a disease, the neighbors are most likely candidates to be downstream targets or mediators of this aberrant phenotype disease module expansion genes that are direct neighbors of known disease associated genes are primary candidates for new disease gene discovery in a mavatar discovery co expression network, where you can specify how many neighbors to include and how proximal those neighbors are to be to your query genes, the added neighbours specifically represent top biological affiliates these are the genes with the strongest statistical correlation (highest normalized t values) to your gene of interest this identifies the most reliable functional partners within the specific context you are investigating high confidence targets for intervention because mavatar discovery focuses on the strongest relationships, these neighbors represent the most direct and reliable points for therapeutic targeting or drug repurposing efforts targeting a neighbor can often perturb the entire subnetwork effectively because they occupy the most proximal network vicinity tissue specific context if the subnetwork is constructed from tissue specific data, the neighbors represent the tissue specific interactions that might explain why a global genetic mutation only manifests as a disease in certain organs how can i find biomarkers? biomarker discovery requires identifying genes that show distinct, condition specific expression patterns and that are measurable in a clinical or experimental context mavatar discovery provides a workflow that moves from broad expression pattern exploration to targeted candidate evaluation you can identify candidate genes through expression patterns using the patient stratification card the patient stratification heatmap display expression profiles for all genes in your graph across variable patient groups and conditions in your tissue of choice or disease specific network the platform performs hierarchical clustering on both samples (unsupervised) and genes within the heatmap, which group genes that behave similarly across conditions by selecting the cohorts most relevant to your research you can generate a focused view that highlights which genes track closely with your condition of interest genes that cluster together with known disease related genes and show a similar expression pattern represent a reasonable starting point for further evaluation as potential biomarkers, since their co expression suggests shared regulatory context or functional involvement in the disease process validating candidate biomarkers across broader condition categories is also possible once you have a set of promising genes from the heatmap, the conditions expression chart card lets you evaluate their expression profiles across a wider range of disease categories, drawing on all the data that contributes to your graph this step is critical for biomarker specificity a gene that appears upregulated in your disease of interest but also shows similar behavior across many unrelated conditions may be a poor biomarker candidate, whereas a gene with a sharp, disease specific expression becomes a much stronger lead the conditions expression chart helps you make this distinction systematically rather than relying on a single comparison a strong biomarker candidate needs to be detectable at the protein level, which means prioritizing protein coding genes you can build this filter into your workflow at the network generation stage by restricting the graph to protein coding genes from the start alternatively, you can highlight protein coding genes within an existing network using the gene type overlay and cross reference this against your candidate list to confirm that your top hits are translatable to proteomic or immunoassay based validation what would a workflow for biomarker discovery look like in mavatar discovery? a biomarker discovery workflow in mavatar discovery might follow this path build a network from your disease gene set or omics results, open the patient stratification card and select disease relevant cohorts, identify genes that cluster with known disease markers and show condition specific expression, confirm their profiles in the conditions expression chart across broader disease categories to assess specificity, and finally verify that your top candidates are protein coding the result is a shortlist of candidates that are supported by network context and co expression evidence and practically viable for biomarker development how can i study patient cohorts? can i perform patient stratification or identify groups of patients that would respond to a drug vs those who wouldn’t? mavatar discovery provides several tools that let you dissect patient cohort level information directly from the platform this will help you understand how different patient populations contribute to the molecular patterns in your network, and ultimately, translate discoveries into clinical insights in the drop down menu under the functional annotations card, you can group the data by diseases this displays the average weighted correlation coefficient for datasets including specific patient cohorts, giving you a view of which disease populations are driving the interactions in your graph if you click on one individual edge, its average weighted correlation coefficient is highlighted within the plot, helping you assess whether a particular gene gene interaction is supported primarily by one cohort or shared across several beyond disease grouping, you can also switch the view to data source, biopsy sites, cell types, or other tags, each of which adds a different layer of granularity to your hypothesis, revealing, for example, whether an interaction is driven by sample from a specific tissue site, a particular cell population, or a specific experimental setup note some users have highlighted the relevance of finding studies with similar scopes of research as their studies within the data source dropdown, allowing them to validate their results further within these datasets under conditions expression chart, the gene expression levels of a selected gene can also be compared between the patient cohorts for any gene in your network, the conditions expression chart shows its expression levels across different patient cohorts and disease categories this is where you can start to assess whether a gene behaves differently in distinct patient populations for instance, whether a potential drug target is consistently overexpressed in one disease subgroup but not another hovering over the individual boxes provides more detailed statistical information, helping you evaluate the magnitude and consistency of expression differences between groups the patient stratification will help you visualize expression patterns for the genes in the graph over variable patient groups the platform allow clustering on both samples and genes, which can reveal clustering of patients based on their molecular profiles, an effective form of data driven patient stratification by selecting specific cohorts of interest, you can narrow the view to focus on the comparisons that matter most to your research genes that cluster differently across these patient subgroups become candidates for stratification biomarkers or indicators of differential drug response note the first version of the heatmaps were based on one representative dataset for the network this is currently being improved to include multiple datasets with a high degree metadata curation level, enabling more comprehensive patient cohort analyses across larger and more diverse populations if the heatmap from your network of choice is based on only one dataset, an update of it is in our pipeline while the platform does not model clinical drug response directly, the combination of these tools enables a powerful indirect workflow for connecting your graph to drug response predictions you can identify genes in your network that are drug targets (using the drug target overlay), then use the conditions expression chart and patient stratification to assess whether those targets behave differently across patient subgroups a drug target gene that is highly expressed in one patient cluster but not another suggests that the corresponding therapy may be more effective in that subgroup this provides a data driven rationale for stratification hypotheses that can be tested in clinical or preclinical settings how can i identify disease mechanisms? mavatar discovery provides multiple entry points and analytical layers for uncovering the molecular mechanisms underlying a disease of interest start by building your disease network you can generate a graph in two ways by selecting from the platform's curated list of diseases or rare diseases, which will seed the network with genes already associated with that condition based on clingen data, or by inputting your own gene list for example, differentially expressed genes from your own study or known biomarkers from the literature either approach gives you a network to begin exploring mechanistic relationships choose the right network context for your question this decision shapes the kind of biology you'll uncover if your goal is to compare disease mechanisms across conditions within the same tissue; for instance, understanding how rheumatoid arthritis mechanisms differ from those in osteoarthritis within joint tissue, the general network is recommended, as it integrates data across all available conditions and allows you to contrast one disease against others using the same interaction landscape if instead you want to retrieve the highest specificity correlations for your disease, a more targeted network (such as a disease specific or a multi tissue cell type specific network) will surface interactions that are most tightly associated with that context, filtering out broader signals that may dilute disease relevant connections use functional enrichment to identify biological processes at play the functional enrichment card lets you run enrichment analysis across several categories of interest on different gene subsets all genes currently visible in your graph, all genes including pruned genes (those that were part of the network but fell below display thresholds), or a manually selected subset of genes, for instance, a specific cluster or a set of hub genes you want to interrogate clicking on individual enriched functions reveals additional detail, including which genes are involved and their enrichment scores, helping you move from a broad functional label to a mechanistic understanding of which genes are driving that enrichment trace the raw data of your gene gene correlations the functional annotation card adds an important layer of interpretability by showing how different datasets and sample cohorts contributed to the edges in your graph this tells you whether a particular gene interaction is supported primarily by a specific disease, healthy controls, or a mix of conditions, giving you confidence about which connections are genuinely disease relevant, and which reflect more general tissue biology what does this look like in practice? start with a curated disease gene set or your own candidate list, build the network in the appropriate context, identify major clusters and run functional enrichment on each to label their biological roles, then use the functional annotation card to confirm which interactions are driven by disease relevant data from there, you can layer on additional analyses patient stratification and conditions expression charts to compare your disease against related conditions, drug target overlays to assess which mechanistic nodes are pharmacologically tractable and progressively building a detailed picture of the disease biology encoded in your network how can i identify key mechanisms in a large graph? large networks in mavatar discovery can contain hundreds or even thousands of nodes and edges, which makes it challenging to pinpoint the specific biological mechanisms that matter most to your research question the platform provides several strategies to manage complexity and extract meaningful insights from dense graphs start by refining graph size at generation if your initial graph is overwhelming, consider regenerating it with more focused parameters lowering the distance parameter or how far the network reaches from your queried gene(s) and reducing the number of neighbors which narrows down how many interacting partners are pulled in at each step together, these adjustments produce a tighter, more interpretable graph centered closely around your genes of interest you can also apply gene type filters during generation for example, restricting to protein coding genes only, or focusing on the class of genes most relevant to your study explore individual clusters for biological meaning large graphs organize into clusters or densely connected regions, and these often correspond to distinct functional modules rather than trying to interpret the entire graph at once, you can select genes within a specific cluster by right clicking and choosing "select local cluster," then interrogate that subset using the platform's tools functional enrichment analysis reveals which biological processes, pathways, or molecular functions are overrepresented in that cluster telling you whether a particular region of your graph is enriched for immune signaling, metabolic processes, or cell cycle regulation the cell type explorer allows you to assess whether the genes in a cluster show stronger expression in specific cell types this adds a layer of cellular resolution to your network analysis, confirming its biological identity and relevance filter edges to surface the strongest connections not all interactions in a large graph carry equal weight you can filter edges based on statistical measures (t value and p value) to retain only the most robust, high confidence connections and remove weaker or noisier associations this is especially valuable when you want to prioritize interactions most likely to reflect a real regulatory or functional relationship rather than incidental co expression focus on tissue specific biology large networks may include core genes identified in mavatar discovery gene lists core genes are highly connected, broadly expressed genes that appear across many tissues and conditions while these genes are biologically important, they can dominate graph topology and obscure the tissue specific interactions that are often most informative for a particular research question filtering out network core genes lets you focus on the connections that are unique or preferentially active in your tissue of interest, revealing mechanisms that might be masked by ubiquitous interactions (follow up ) my network is too large, and i don’t know where to start or how to interpret it a classic workflow for dissecting a large graph might look like this start by filtering edges on t value and p value to reduce noise, then optionally remove core genes to emphasize tissue specificity visually identify the major clusters in the resulting graph, select each one individually, and run functional enrichment to label its biological role and cell type explorer to assign cell type context this iterative process turns a complex, hard to read network into a set of clearly defined functional modules, each with an interpretable biological identity how can i use the platform to enhance my proteomics study? proteomics experiments generate highly valuable data, but they come with inherent constraints that can limit the scope of your biological interpretation mavatar discovery is designed to complement and extend proteomics results in two ways expanding beyond panel limitations proteomics platforms, particularly targeted approaches like olink panels, measure a defined set of proteins, often numbering in the hundreds to low thousands while this provides deep, reliable quantification for those targets, it means you're only seeing a slice of the full molecular picture mavatar discovery lets you take the genes identified in your proteomics data and place them into comprehensive, tissue specific gene interaction networks built from large scale transcriptomic data this reveals new interacting partners, upstream regulators, and downstream targets that fall outside the scope of your panel or connections you would otherwise miss entirely in practice, this means a focused olink panel of, for example, 96 proteins can lead you into a rich network of functionally related genes, giving you new hypotheses and candidate targets to pursue overcoming small sample sizes proteomics experiments are expensive, which typically constrains cohort sizes a study might include tens of samples rather than hundreds or thousands this limits statistical power and makes it difficult to generalize findings across broader patient populations or conditions with mavatar discovery, you can contextualize your results within networks derived from thousands of publicly available transcriptomic samples this effectively places your findings from a small, focused cohort into a much larger biological context, helping you assess whether the patterns you observe are robust, condition specific, or part of broader regulatory programs that hold across larger and more diverse datasets example extending an olink inflammation panel in rheumatoid arthritis imagine you've run an olink inflammation panel (92 proteins) on synovial fluid samples from 30 rheumatoid arthritis patients and 15 healthy controls your analysis identifies a cluster of upregulated proteins (e g , il 6, cxcl8, ccl2, and mmp3) that distinguish rheumatoid arthritis from controls these are well characterized inflammatory mediators, but your panel can't tell you what else is happening beyond those 92 targets by inputting these key hits into mavatar discovery and exploring them within a joint tissue or blood network, you can uncover co expressed genes and interaction partners that weren't on your panel at all for instance, a transcription factor driving the coordinated upregulation of your cytokine cluster, or a receptor that links your chemokine signal to a downstream pathway you hadn't considered using the functional annotation card, you can verify whether these new connections are enriched in inflammatory or autoimmune conditions specifically and with patient stratification and conditions expression chart, you can compare how this extended network behaves in rheumatoid arthritis versus other inflammatory diseases versus healthy controls, adding disease specificity to findings that your original 30 patient cohort alone couldn't power the result is that a focused, expensive proteomics experiment on a small cohort becomes the seed for a much broader, network level understanding, revealing biology that would have required either a much larger panel or a much larger study to capture through proteomics alone how can i compare cancer vs non cancer or sick versus healthy? i want to see healthy or control specific networks our networks are built from comprehensive tissue level data each tissue network (e g , blood) integrates transcriptomic information across all available conditions including cancer, viral infections, autoimmune disease, healthy controls, and more this means the network itself is not cancer specific or healthy specific; it represents the full landscape of gene interactions observed in that tissue rather than needing separate "cancer" and "healthy" networks, you use the annotation and expression tools built into the platform to dissect where the signal is coming from functional annotation each edge in the network carries metadata about the conditions and samples that contribute to it by examining the functional annotation card, you can see which diseases, phenotypes, or conditions are driving a particular gene gene correlation this lets you identify whether an edge is primarily supported by cancer samples, healthy samples, or a mix — without needing to build a separate network from scratch patient stratification the patient stratification heatmap lets you evaluate how a gene or a set of genes in your network behaves across specific conditions you can directly compare expression patterns in cancer or sick versus healthy, or across multiple disease states, to see where your network genes are differentially active conditions expression chart this tool provides a more targeted look at how a gene performs under specific conditions you can view conditions and assess whether the genes in your network of interest are upregulated, downregulated, or unchanged — giving you a functional comparison without requiring a condition restricted network in summary , you don't need a separate healthy only or cancer only network the platform's strength is that it builds rich, tissue level networks from all available data, and then gives you the tools functional annotation, patient stratification, and conditions expression chart, to slice into specific comparisons like cancer versus non cancer at the edge and gene level how can i use this tool to find fda approved drugs within my biomarkers detected using proteomics or other omics? once you've identified a set of biomarkers from your proteomics, transcriptomics, or other omics experiment, a natural next question is whether any of those targets are already druggable or more specifically, whether fda approved or clinically relevant compounds exist for them mavatar discovery integrates drug target information from chembl, making this exploration straightforward within the platform search directly by drug you can use the search functionality to look up a specific drug by name the platform will return the genes that are known targets of that compound, based on curated chembl data this is useful when you already have a drug of interest for example, if you want to check whether any of your biomarkers overlap with the known targets of a particular drug or when you want to work backwards from an approved drug to see which of its targets appear in your query highlight drug targets within your network if you've already built a network from your omics deregulated genes, you can overlay drug target information directly onto your graph using the find menu and toggling the drug target option, the platform will highlight which genes in your network are known drug targets this gives you an immediate visual readout of druggability across your entire network, not just your original biomarker list, but also the novel interacting partners and regulators that the network expansion revealed genes that light up as drug targets become immediate candidates for repurposing hypotheses or for prioritization in downstream validation why does this matter in an omics context? your proteomics panel or omics experiment gives you a set of biologically relevant genes, but not all of them will be actionable from a therapeutic standpoint by combining your results with the chembl drug target layer in mavatar discovery, you can quickly filter your findings to focus on the subset that already has pharmacological or approved therapies available this is especially powerful after network expansion a gene that wasn't on your original panel but emerged as a key hub in the network and happens to be a target of an existing drug represents a repurposing opportunity which you never would have found from your omics data alone what does the workflow look like? start with your biomarker list from proteomics or another omics approach by uploading a gene list using our file method, build a tissue specific network in mavatar discovery to expand beyond your original hits, then use the drug target toggle to identify which nodes in that network are druggable from there, you can use the functional annotation card, patient stratification, and conditions expression chart to assess whether those druggable targets are relevant to your specific disease context, closing the loop from biomarker discovery to actionable therapeutic insight how can i perform cell type specific studies? i am only interested in exploring \[insert cell type of choice] in my research if your research is focused on a particular cell type, mavatar discovery offers cell type specific networks that let you work directly within the biology of that cell population rather than inferring cell type activity from bulk tissue data the platform includes multi tissue networks for several key cell types b cells, fibroblasts, and macrophages and monocytes these networks are not computationally deconvolved from bulk tissue, they are built from actual cell type resolved data including sequenced sorted cell populations this means the gene interactions you see in these networks reflect co expression and regulatory relationships that are operating within your cell type of interest, not confounded by signals from surrounding cells if you're studying fibroblast driven inflammation, for example, you can build and explore networks in the fibroblast context specifically, ensuring that every edge reflects a relationship observed within fibroblasts rather than an artifact of cellular heterogeneity because these cell type networks integrate data from multiple tissues, they capture how your cell type of interest behaves across different biological environments a b cells network, for instance, includes data from b cells isolated across various tissues and conditions, giving you a broader view of that cell type's regulatory landscape than any single tissue or single study dataset could provide how to do this? select the cell type specific network that matches your research focus for example, the macrophages and monocytes network if you're studying innate immune mechanisms, or the b cells network if you're investigating antibody production in adaptive immunity input your genes, drug targets, disease related genes or pathways of interest build your network and explore it using the same tools available in tissue networks functional enrichment to identify pathway involvement, the functional annotation card to examine condition level contributions to each edge, patient stratification and conditions expression charts to compare your genes across disease states, and the drug target overlay to assess druggability the difference is that every interaction and pattern you uncover is grounded in your cell type of choice, giving you confidence that the biology you're seeing is cell type relevant rather than a mixed tissue signal references s van dam, u võsa, a van der graaf, l franke, j p de magalhães, gene co expression analysis for functional classification and gene–disease predictions brief bioinform 19, 575–592 (2018) https //doi org/10 1093/bib/bbw139 https //doi org/10 1093/bib/bbw139 w yin, l mendoza, j monzon sandoval, a o urrutia, h gutierrez, emergence of co expression in gene regulatory networks plos one 16, e0247671 (2021) https //doi org/10 1371/journal pone 0247671 https //doi org/10 1371/journal pone 0247671 m russell, a aqil, m saitou, o gokcumen, n masuda, gene communities in co expression networks across different tissues plos comput biol 19, e1011616 (2023) https //doi org/10 1371/journal pcbi 1011616 https //doi org/10 1371/journal pcbi 1011616 m aliyu, f t zohora, a u anka, k ali, s maleknia, m saffarioun, g azizi, interleukin 6 cytokine an overview of the immune regulation, immune dysregulation, and therapeutic approach int immunopharmacol 111, 109130 (2022) https //doi org/10 1016/j intimp 2022 109130 https //doi org/10 1016/j intimp 2022 109130