The genome projects have unearthed an enormous diversity of novel genes of unknown function that require biological and biochemical characterization to assess their role in the organism(s) from which they were derived. These genes, like all others, can be grouped into families based on sequence similarity.
The PFAM database 23.0 contains over 2200 such families, referred to as Domains of Unknown Function (DUF). In a coordinated effort, the four large-scale centers of the NIH Protein Structure Initiative have determined the first three‑dimensional structures for more than 250 of these DUF families. Analysis of the first 248, solved until October 2008, reveals that they significantly vary in size (with an average of 252 proteins) and in contributions from sequenced genomes and from metagenomic data (see the chart on the right). It also shows that about two thirds of the DUF families likely represent very divergent branches of already known and well-characterized families, which allows us to propose hypotheses about their biological function. The remainder can be formally categorized as new folds or topologies, although about one third of these show significant sub-structure similarity to previously characterized folds. The homology to functionally annotated protein families remains an important clue in proposing hypotheses about functions of DUF families but it is usually not sufficient for a very reliable functional annotation. The chart below shows overall percentages of DUF families with new folds, new folds partially similar to previously known folds, putative analogs, putative homologs and recognizable homologs. The inset pie charts show the percentage of DUF families with proposed hypothesis about function in each of these six categories. From a more general perspective, our results infer that, despite the enormous increase in the number and the diversity of new genes being uncovered, the fold space of proteins encoded by those genes is gradually becoming saturated. These previously unexplored sectors of the protein universe are, therefore, primarily shaped by extreme diversification of known protein families, which enables organisms to evolve new functions and adapt to particular niches and habitats. Notwithstanding, these DUF families still constitute the richest source for discovery of the remaining protein folds and topologies.We recently published a paper on the structural analysis of DUF families solved by PSI centers, which was published in Plos Biology.
http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000205
Displaying: 0 - 10
Next| Annotation | Solved by | Fold Type | |
| PF01519: Pfam family PFAM:PF01519 {Protein of unknown function DUF16 } with 42 members in NR database and additional 2 members in the metagenomic datasets is represented in Archaea, and Bacteria. The first st... | BSGC | Homolog | |
| PF01796: Pfam family PFAM:PF01796 {Domain of unknown function DUF35 } with 1010 members in NR database and additional 561 members in the metagenomic datasets is represented in Archaea, and Bacteria. The first... | JCSG | Homolog | |
| PF01861: Pfam family PFAM:PF01861 {Protein of unknown function DUF43} with 30 members in NR database. The first structural representative solved (PDB Id: TOPSAN:2qm3) was subject to FATCAT structural similarit... | MCSG | Homolog | |
| PF01865: Pfam family PFAM:PF01865 {Protein of unknown function DUF47 } with 595 members in NR database and additional 352 members in the metagenomic datasets is represented in Archaea, and Bacteria. The first... | JCSG | Homolog | |
| PF01877: Pfam family PFAM:PF01877 {Protein of unknown function DUF54 } with 175 members in NR database and additional 89 members in the metagenomic datasets is represented in Archaea only. The first structural... | NYSGXRC | Putative Analog | |
| PF01883: Pfam family PFAM:PF01883 {Domain of unknown function DUF59 } with 3219 members in NR database and additional 2322 members in the metagenomic datasets is represented in Archaea, Bacteria, and Eukaryota... | JCSG | Putative Homolog | |
| PF01893: Pfam family PFAM:PF01893 {Uncharacterized protein family UPF0058 } with 41 members in NR database and additional 5 members in the metagenomic datasets is represented in Archaea. The first structural r... | NESG | Putative Analog | |
| PF01904: Pfam family PFAM:PF01904 {Protein of unknown function DUF72 } with 478 members in NR database and additional 168 members in the metagenomic datasets is represented in Archaea, Bacteria, and Eukaryota.... | JCSG | Putative Homolog | |
| PF01906: Pfam family PFAM:PF01906 {Domain of unknown function DUF74 } with 756 members in NR database and additional 468 members in the metagenomic datasets is represented in Archaea, Bacteria, and Eukaryota. ... | MCSG | Putative Analog | |
| PF01908: Pfam family PFAM:PF01908 {Protein of unknown function DUF75} with 848 members in NR database and additional 664 members in the metagenomic datasets is represented in Archaea, Bacteria, and Eukaryota. ... | MCSG | Putative Analog |
No references found.
| File | Size | Date | Attached by | |||
|---|---|---|---|---|---|---|
| duf_sizes.PNG No description | 6.42 kB | 20:45, 1 Oct 2009 | lukasz | Actions | ||
| homology.PNG No description | 12.02 kB | 21:54, 30 Sep 2009 | lukasz | Actions | ||