1. General Information
On this page the test protein sets are collected. The test sets are useful for two purposes:
- to demonstrate how HH-MOTiF works
- to measure and compare the performance of HH-MOTiF
All the sets produce reproducible output, which matches the presented samples, in standard mode.
2. ELM database
Major performance tests of HH-MOTiF were conducted on the ELM database as of 26.03.2016. Only the motifs with instances at least 3 proteins were considered. Some metrics of the resulting dataset are provided below:
- 176 motifs
- 1,677 unique proteins
- 2,022 proteins gross
- 1,452,618 total residues gross
- 17,909 motif residues gross
3. Selected protein sets
This set contains an ELM motif responsible for the sorting and internalisation signals directing type I transmembrane proteins from the cell surface or TGN to the lysosomal-endosomal compartment (more information here). It consists of 3 non-related proteins of approximately the same length from different organisms and represents an 'easy case'. This motif is recovered by HH-MOTiF almost perfectly with residue-wise F1 of 0.927. This set is used as the default sample set.
This set contains an ELM motif involved in protein-protein interaction mediated by SH3 domains (more information here). It consists of 12 proteins of diverse length. This motif is harder to find, as it represents a low complexity region (with only conserved residues being Prolines) in the surrounding of other - not motif-containing - low complexity regions. Nevertheless, the motif is getting partially recovered by HH-MOTiF with a moderate residue-wise F1 of 0.279. Although there is no straightforward filter implemented, HH-MOTiF ignores the majority of low complexity regions that do not belong to motifs. The reason behind the specificity is that HH-MOTiF also takes into account the surrounding amino acid context as well as the number of proteins with occurrences.
This set contains an ELM motif responsible for the interaction with gamma-ear domains (more information here). It consists of 7 proteins, some of which being close homologs. The major difficulty for identifying this motif is, however, its similarity to low complexity regions (D/E-rich). Nevertheless, the advanced filtering algorithm of HH-MOTiF makes it possible to locate this motif with fair accuracy (residue-wise F1 of 0.582).
This set contains an ELM motif responsible for the interaction with apoptosis-linked gene 2 (ALG-2) protein (more information here). It consists of 3 proteins, 2 of which have strongly pronounced low complexity nature. Nevertheless, smart homology filtering of HH-MOTiF allows for almost perfect (residue-wise F1 of 0.865) recognition of this motif without outputting false positives.
This set contains an ELM motif responsible for the interaction with actin (more information here). It contains a short α-helix at its N-terminus followed by a disordered region, and in several cases also by the additional conserved pattern T.[DE]...P. This motif is quite long and stands out from the surrounding sequence. However, this ELM set contains only 6 proteins forming only 2 non-homologous groups, which makes the recognition more challenging. HH-MOTiF locates this motif with almost ideal accuracy (residue-wise F1 of 0.965).