Emerging high-throughput technologies like next-generation sequencing (NGS) have led to a dramatic increase of descriptive and functional genetic information over the past decade, revealing gene properties such as gene family, tissue distribution, gene function or pathway membership. Further processing of these properties into gene similarities beyond sequence homology enables the unbiased exploration of inter-gene relationships. Existing computational tools which apply such gene relationships are e.g. UCSC Gene Sorter [1] and EvoCor [2]. However, these tools apply each similarity independently and don't make use of multidimensional scoring.

Term-to-Gene Search

Genehopper is a new search engine with a focus on human genes which allows the exploration of gene-to-gene relationships. It can handle two different query types: the typical use case starts with a term-to-gene search (Figure 1), i.e. an optimized full-text search for an anchor-gene of interest. The web-interface can handle one or more terms including gene symbols and identifiers of Ensembl, UniProtKB, EntrezGene and RefSeq. Additionally Genehopper can find genes by publication or SNP variant identifiers, even unspecific vocabulary is handled.

Screenshot Term-to-Gene Search Figure 1. Result page of the term-to-gene search with the exemplary query string TP53. Each row in the result list corresponds to a single human gene at which the top one list is supposed to be the most relevant gene concerning the user query. A gene can be selected as anchor for the second search type by clicking on the "Similar Genes" button in the red rectangle. More detailed information about a gene can be requested by clicking on its gene name.

Gene-to-Gene Search

When the anchor-gene is defined, the user can explore its neighbourhood as the weighted sum of normalized gene similarities according to Table 1.

SimilarityData SourceMeasure
1.Homology SHOMEnsembl ComparaSequence Identity
2.Normal Tissue Expression Profile SNEXHuman Protein AtlasSpearman
3.Interpro Protein Domain SIPDSwissprotCosine
4.Swiss-Prot Protein Feature SSPFSwissprotCosine
5.Variant-related Publications SVPEnsembl VariationCosine
6.GO Cellular Component SCCEnsembl CoreResnik-BMA
7.GO Molecular Function SMFEnsembl CoreResnik-BMA
8.GO Biological Process SBPEnsembl CoreResnik-BMA
9.HUGO Gene Symbol SHGSHGNCPrefix Distance
Table 1. Gene similarities used in the gene-to-gene search and the respective data source and measure.

All gene-to-gene similarities are pre-calculated to ensure fast retrieval time. Each weight can be adjusted by the users and thus allowing flexible customization of the gene search according to specific use cases. Result genes are ranked in descending order according to their overall ranking score which is given by the weighted sum of pairwise similarities between the anchor gene and all other genes (Figure 2).

All implemented similarities have a low to pairwise correlation (max r2 = 0.35) implying a low linear dependency i.e. any change in a single weight has an effect on the ranking. Thus, we treated them as separate dimensions in the search space.

Screenshot Gene-to-Gene Search Figure 2. Result page of the gene-to-gene search with the exemplary query gene TP53. Each row in the result panel corresponds to a gene for which nine similarity scores and a ranking score is displayed. (Red rectangle) The ranking can be adjusted by reconfiguring the weights with the dropdown fields. (Blue rect.) Weight profiles can be saved for later searches by providing a profile name. (Orange rect.) Additional information about each similarity can be displayed by clicking on the similarity abbreviations. To enable further analysis of the results, the top 1000 genes as well as their similarity values can be downloaded ("Download Top1000"). Moreover we provide raw data for each pair of anchor gene and result gene by clicking into the cell of a specific similarity value or on the overall score.

  1. Kent, al. (2005) Exploring relationships and mining data with the UCSC Gene Sorter. Genome Res., 15(5), pp. 737-741. doi:10.1101/gr.3694705
  2. Dittmar,W.J. et al. (2014) EvoCor: a platform for predicting functionally related genes using phylogenetic and expression profiles. Nucleic Acids Res., 42(W1):W72-W75. doi:10.1093/nar/gku442