Curated data sets for assessment
Standard 3 (Adh.std3.gff) “more complete gene set”
- 222 gene structures (39 single- and 183 multi- coding exon genes)
- Criteria:
- Annotated as described in Ashburner et al.
- cDNA to genomic alignment using sim4
- Start codons predicted by ORFFinder (Frise et al., unpublished)
- ~182 genes have similarity to a homologous protein sequence in another organism or have a Drosophila EST hit
- Edge verification by partial EST/cDNA alignments
- BLASTX, TBLASTX homology results
- PFAM alignments
- Gene structure verification using GenScan (human)
- 14 genes had EST/homology hits but no gene finding predictions
- ~40 genes only have “strong” GenScan predictions