On the identification of group II introns in nucleotide sequence data.
Palabras clave
Abstracto
Four different consensus sequences (GTI, group II identifiers) have been derived from domains V of known group II introns and are used as query input sequences for sensitive database screenings with the FASTA and LFASTA programs. The set of four GTI sequences can identify all domains V of the 96 known group II introns in the completely sequenced chloroplast genomes of Marchantia polymorpha, Epifagus virginiana, Oryza sativa, Nicotiana tabacum and the completely sequenced mitochondrial genomes of Saccharomyces cerevisiae, Podospora anserina, Schizosaccharomyces pombe and Marchantia polymorpha. Seven moderately high-scoring hits can easily be rejected as false-positives since they do not fulfil secondary structure requirements. Large FASTA outputs obtained after screening the entire nucleotide sequence database are evaluated in a second step by a program (D5SCAN) that allows the assignment of variable selection criteria for potential domain V secondary structures. Database searches with these routines yield evidence for several group II intron sequences previously unrecognized. These include novel intron structures in the cyanobacterium Synechocystis and in the mitochondrial genomes of Marchantia, soybean, pea, broad bean, sugar beet and a heterobasidiomycete. Potential intron remnants are found contributing to the secondary structure of rRNAs in several trypanosome species. At a given sensitivity of 95% positively identified true domains V, the search routine produces one false positive hit per 10,000 kb.