Description of the Community Wide Experiment on the Assessment of Gene Prediction for the "Drosophila melanogaster" genome: The Adh region (2.9 Mbases)

Introduction

Methods for predicting gene structures and other functional sites in DNA sequences have been advancing rapidly. We are interested in assessing what the various methods provide, and how reliable they are.

The goal of this experiment is to obtain an in-depth and objective assessment of the current state of the art in gene and functional site predictions in genomic DNA. To this end, participants will predict as much as possible about a sample genomic region that has been studied intensively in the past. Participants are encouraged to predict genes, functional sites such as promoters, transcription start sites, transcription factor binding sites, splice sites, start codons, stop codons, and other sites of interest. Biological annotations are also welcome, as are "consensus" annotations arrived at by any method of combining other predictions (please describe your combination method).

Besides providing the sample genomic sequence for participants to annotate, we will make other relevant datasets available, including complete cDNAs for Drosophila melanogaster.

Our tutorial presentation at ISMB-99 will summarize the annotations submitted by participants in the experiments, describe the methods used, and discuss how we define "success" in the annotation process.

Goal

The main goals of the experiment are to address the following questions about the current state of the art in genome annotation:
  1. Are the gene predictions similar to the known gene structures?
  2. Are the details of the gene predictions correct (e.g., splice sites)?
  3. What other DNA features (besides gene structure) can be reliably identified?
  4. Which analysis methods are the most effective?

Participation

Participation in the experiment is open to everyone, whether or not you will be attending ISMB-99. If you plan to participate, please send us email. Predictors may form teams; each team should have a designated group leader. Each team will be issued a unique ID number, which will serve to identify their predictions. Those interested in receiving mailings concerning progress of the experiment may also register as 'observers'. Predictions must be emailed to us in GFF format before June 30, 1999. Each team should submit a single GFF-format file that includes all features predicted on the sample 2.9Mbase sequence.

Every participant in the experiment will be invited to a free dinner in Heidelberg the evening of the ISMB tutorial.

Assessment of Predictions

The Drosophila Genome Center team will evaluate the predictions by comparing them with each other and with the current "best" annotations put together by our center. There are NO winners and losers--our interest is in seeing which annotation methods are being used and what their relative strengths and weaknesses are.

Release of Results

All submitted annotations and their evaluations will be made available through this web site shortly before the ISMB meeting, at which the submissions will be discussed and compared.

Timetable

May 1, 1999 - June 30, 1999
Distribution of the sample sequence and associated data to predictors. Collection of predictions.
June 30, 1999 - July 31, 1999
Evaluation of the predictions by the Drosophila Genome Center
August 6, 1999
Tutorial #3 at the ISMB-99 conference in Heidelberg, Germany

Organizing Committee

Martin Reese          Drosophila Genome Center, University of California, Berkeley, USA
Nomi Harris           Drosophila Genome Center, Lawrence Berkeley National Laboratory, USA
Suzanna Lewis         Drosophila Genome Center, University of California, Berkeley, USA
George Hartzell       Drosophila Genome Center, University of California, Berkeley, USA
Uwe Ohler             University of Erlangen, Germany

Queries

Please address any questions or queries to compfly@bdgp.lbl.gov


compfly@bdgp.lbl.gov
Last modified: Mon Jun 7 17:43:41 PDT 1999