About
DINA Network Construction
3 min
standard the linear correlation between each gene pair was computed across all relevant datasets and samples, and further statistically assessed resulting in a normalized t value the normalized t value quantifies how consistently the association between two genes is conserved across all included datasets, within the user defined tissue and biological context (higher t values indicate stronger gene–gene associations) this measure incorporates dataset size, assigning greater weight to larger studies only gene pairs found in at least 15 datasets were included in the final displayed network mavatar discovery users can adapt how many strong associations (neighbors) to include in the outcome network by adapting the distance and neighbor parameters within the generate graph window importantly, only positive correlations are considered in the analysis ai autoencoder creating gene interaction networks with autoencoders (ae) offers the advantage of capturing non linear relationships between genes which cannot be captured by linear models using the latent dimensions from the bottleneck layer representations of aes, researchers may unveil hidden biologically meaningful features and interactions that cannot be captured by standard approaches thus, networks by aes can aid users in identifying further hidden interactions expression matrices were pre processed as described in docid\ aqwp0r sg1opiwvvltcbu to enhance the data distributions for the ae training phase, a more stringent filtering of low variable genes was required, removing genes with a normalized expression value < 0 1 in more than 25% of the samples within the dataset model architecture we used an autoencoder (h20, version 3 44 0 3 ( docid 4qgal9kyatnpnnislmfc )) to explore linear and non linear gene associations within expression data the method relies on a compact neural network design that reduces the data into a smaller set of features and then reconstructs the original profiles from this compressed representation by adjusting the network structure, we identified models that captured the most relevant expression signals while minimizing noise to ensure reliable results, the data were split into training and test sets, and model performance was evaluated on unseen samples the most effective configurations were selected, and the resulting latent features — representing condensed summaries of the gene expression profiles — were used for downstream network analyses low variability features were filtered out so that only the informative signals were used as an input for ai network construction flow references h2o ai, h2o 3, version 3 44 0 3 (2024) (available at https //github com/h2oai/h2o 3 )