Table of Contents
The challenge of annotating a complete eukaryotic genome:A case study in Drosophila melanogaster
Abstract
Tutorial goals
Tutorial organization
What is a gene?
What are annotations?
How does an annotation differ from a gene?
Transcription and translation
Schematic gene structure
Sequence feature types
DNA transcription unit features
mRNA features
PPT Slide
Definitions for data modeling
Annotation
Annotation process overview
Types of sequence data
Auxiliary data
Computational annotation tools
Database resources
Biological issues in annotation
Engineering issues in annotation
Engineering issues in annotation
Engineering issues in annotation
Engineering issues in annotation
Engineering issues in annotation
Drosophila melanogaster
Drosophila Genome Project
Goals of the Drosophila Genome Project
Sequencing at the BDGP
The BDGP sequence annotation process
What sequence to start with?
Which analyses need to be run?
Which analyses need to be run and how?
What public sequence data sets are needed?
Which analyses need to be run and how?
How do you achieve computational throughput?
What do you do with the results?
Is human curation needed?
Gene Skimmer
Gene Skimmer
CloneCurator
PPT Slide
How do we annotate gene/protein function?
Ontology browser
PPT Slide
Ontology browser: searching for terms
How do you distribute the data?
Ribbon
Ribbon
How do you manage the data?
How do you maintain annotations?
Integrated annotation systems
Integrated annotation systems: ACeDB
ACeDB
Genotator
Magpie
GAIA
TIGR Human Gene Index
Computational analysis tools
Gene finding: Prokaryotes vs. Eukaryotes
Gene finding: Prokaryotes vs. Eukaryotes
Integrated gene finding
Integrated gene finding: Dynamic programming
Integrated gene finding: Dynamic programming
Integrated gene finding: Linear and Quadratic Discriminant Analysis (LDA/QDA)
Integrated gene finding: Feed-forward neural networks
Approaches to gene finding: Hidden Markov models
Approaches to gene finding: Generalized hidden Markov models
Gene finding software
Promoter recognition
Promoter recognition (cont.)
Promoter recognition (cont.)
Promoter recognition (cont.)
Example: NNPP
Promoter recognition (cont.)
Splice site prediction
Splice site prediction (cont.)
Splice site prediction (cont.)
Start codon prediction
Poly-adenylation signal prediction
Prediction of coding potential
Prediction of coding potential (cont.)
Prediction of coding potential (cont.)
Prediction of coding potential (cont.)
Prediction of coding potential (cont.)
Prediction of coding exons
“Integrated” gene models: LDA/QDA
“Integrated” gene models: NN
“Integrated” gene models: Artificial intelligence approaches
“Integrated” gene models: Artificial intelligence approaches
“Integrated” gene models: HMMs
“Integrated” gene models: GHMMs
Example: Genie
“Integrated” gene models: GHMMs
EST/cDNA alignment for gene finding: Spliced alignments
EST/cDNA alignment
EST/cDNA alignment (cont.)
Repeat finders
Repeat finders (cont.)
Homology searching
Gene family searching
The genome annotation experiment (GASP1)
PPT Slide
Goals of the experiment
Adh contig
Adh paper (to appear in Genetics)
Raw sequence: Adh.fa
Drosophila data sets provided to participants
Timetable
Resources for assessing predictions
Curated data sets for assessing predictions
Curated data sets for assessing predictions
Curated data sets for assessment
Submission format
Sample submission
Submissions
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submissions (cont.)
Submission classes
Submission classes (cont.)
Gene finding techniques
Measuring success
Definitions and formulae
Genes: True positives (TP)
Genes: False positives (FP)
Genes: False Negatives (FN)
Toy example 1 (1)
Genes: Missing Genes (MG)
Genes: Wrong Genes (WG)
Toy example 1 (2)
Genes: Std 1 versus Std 3
Toy example 1 (3)
Genes: Std1 and Std3 versus “real” gene structure
Toy example 1 (4)
Toy example 1 (5): Exon level
Genes: Joined genes (JG)
Genes: Split genes (SG)
Definition: “Joined” and “split” genes
Toy example 2 (1)
Annotation experiment results
Results: Base level
Results: Exon level
Results: Gene level
Results: Gene level
Results (protein homology): Base level
Results (protein homology): Exon level
Results (protein homology): Gene level
Transcription Start Site (TSS): Standard 1
TSS: Standard 3
Results: TSS recognition
Interesting gene examples: bubblegum
Adh/Adhr (Alcohol dehydrogenase/Adh related)
Adh/Adhr (cont..)
osp (outspread)
cact (cactus)
kuz (kuzbanian)
beat (beaten path)
Idfg1, Idfg2, Idfg3 (Imaginal Disc Growth Factor)
Idfg1, Idfg2, Idfg3 (cont.)
Conclusion of GASP1
Conclusion GASP1 (cont.)
Discussion GASP1
Conclusions on annotating complete eukaryotic genomes
Conclusions on annotating complete eukaryotic genomes (cont.)
Discussion on annotating complete eukaryotic genomes
Acknowledgments
|
Author: Martin G. Reese, Nomi L. Harris,
George Hartzell, Suzanna E. Lewis
Email: mgreese@lbl.gov
Home Page: http://www.fruitfly.org/GASP1
Other information: Tutorial #3 Presentation at the ISMB '99
conference in Heidelberg, Germany, August 6, 1999
including the annotation experiment GASP1
Download powerpoint presentation source
|