A High Throughput Screen to Identify Novel Secreted and Transmembrane Proteins Involved in Drosophila Embryogenesis
Casey C. Kopczynski¹, Jasprina N. Noordermeer¹, Thomas L. Serano, Wei-Yu Chen, John D. Pendleton, Suzanna Lewis, Corey S. Goodman and Gerald M. Rubin¹. ¹ These authors contributed equally to this work. Howard Hughes Medical Institute, Department of Molecular and Cell Biology University of California Berkeley, CA 94720-3200 USA
ABSTRACT Secreted and transmembrane proteins play an essential role in intercellular communication during the development of multicellular organisms. As only a small number of these genes have been characterized, we developed a screen for genes encoding extracellular proteins that are differentially expressed during Drosophila embryogenesis. Our approach utilizes a new method for screening large numbers of cDNAs by whole embryo in situ hybridization. The cDNA library for the screen was prepared from rough endoplasmic reticulum-bound mRNA, and is therefore enriched in clones encoding membrane and secreted proteins. To increase the prevalence of rare cDNAs in the library, the library was normalized using a novel method based on cDNA hybridization to genomic DNA-coated beads. In total, 2518 individual cDNAs from the normalized library were screened by in situ hybridization, and 917 of these cDNAs represent genes differentially expressed during embryonic development. Sequence analysis of 1001 cDNAs indicated that 811 represent genes not previously described in Drosophila. Expression pattern photographs and partial DNA sequences have been assembled in a database publicly available at the Berkeley Drosophila Genome Project website (http://www.fruitfly.org). The identification of a large number of genes encoding proteins involved in cell-cell contact and signaling will advance our knowledge of the mechanisms by which multicellular organisms and their specialized organs develop. INTRODUCTION A major goal of developmental biology is to elucidate the molecular mechanisms that govern cell-cell interactions in higher eukaryotes. Genetic analysis of development in Drosophila has proven to be a powerful approach for studying these mechanisms. For example, most of the genes known to be involved in the hedgehog (1, 2), dpp/BMP (3), and Wnt (4) signaling pathways were identified through classical genetic screens in Drosophila. The characterization of these genes and their vertebrate homologs has greatly advanced our understanding of the cell signaling pathways that regulate development. Genetic screens, however, have significant limitations. Genes with subtle loss-of-function phenotypes or genes whose function can be compensated for by other genes or pathways are unlikely to be found. These two classes of genes may represent the majority of genes in Drosophila, since it is estimated that two-thirds of Drosophila genes are not required for viability (5). In addition, screens designed to identify specific phenotypic defects often do not recover genes with pleiotropic roles during development, since the requirement for gene function in one developmental process can mask its requirement in another. To identify all classes of developmentally important genes, expression-based and other molecular screens are needed to supplement classical genetic screens. In Drosophila, the most productive such screens to date have utilized P element-based enhancer traps (6-9), but P element insertion is not random and enhancer trap screens are biased towards identifying genes that are favored for insertion by P elements (10). Other expression-based screens to specifically identify extracellular proteins have involved generating monoclonal antibodies against crude membrane preparations and screening by immunostaining of embryos (11, 12). Unfortunately, antibody screens are biased towards identifying the most abundant or highly immunogenic proteins and thus typically identify only a small subset of proteins. We present a novel, large scale screen for genes encoding secreted and transmembrane proteins that are expressed in specific tissue or cell types during embryonic development in Drosophila. The approach combines a cDNA library enriched for genes encoding extracellular proteins with a high throughput whole embryo in situ hybridization procedure and subsequent sequence analysis. The results have been compiled in a publicly available database. MATERIALS AND METHODS All protocols used in this study are available in a more detailed form at http://www.fruitfly.org. RNA isolation from rough endoplasmic reticulum Rough endoplasmic reticulum membranes or rough microsomes (RMs) were isolated from 10g of 8 - 16 hr (25°C) embryos using a sucrose gradient sedimentation procedure (13, 14) with some modifications. PolyA+ RNA was purified from the RM RNA preparation using the PolyA Select kit (Promega). cDNA library construction A directionally-cloned RM cDNA library was prepared from RM polyA+ RNA using standard techniques (15), except that the RNA was annealed with a Pst-T15 primer/adaptor (5'-CACCTTGTCTCACTGCAGT15) and the first strand cDNA synthesized in the presence of 5-methyl dCTP (Pharmacia) to protect internal Pst I sites from subsequent digestion. Double-stranded cDNA was then repaired with T4 DNA polymerase, ligated with Hind III/Xmn I adaptors (New England Biolabs), digested with Pst I, size-selected to remove cDNAs smaller than 500 bp (15), and cloned into Hind III/Pst I-digested pBluescript SK(+) (Stratagene). The ligated plasmid was transformed into XL-1 Blue MRF' (Stratagene) to obtain a library of 5 X 105 independent cDNA clones. The normalized RM cDNA library was prepared from single-stranded RM cDNA eluted from genomic DNA beads (see below). Single-stranded cDNA was converted to double-stranded cDNA using the Bluescript KS primer, cloned into pBluescript SK(+) and transformed into XL-1 Blue MRF' as described above. A normalized library of 4.4 X 104 independent cDNA clones was obtained. Preparation of genomic DNA-coated magnetic beads and normalization of the RM cDNA library Genomic Drosophila DNA was partially digested with Sau3A and Mae III, size fractionated and a Klenow "fill in" reaction (15) was used to incorporate biotin-dUTP (ENZO Biochem) into the ends of the Sau 3A and Mae III fragments. The biotin-labeled genomic DNA was immobilized on streptavidin-coated magnetic beads (Dynal) using a modification of the manufacturer's instructions. The beads were collected, washed and used immediately for cDNA hybridization. To prepare single-stranded cDNA "driver" for hybridization to the genomic DNA "target", the RM library was transcribed in vitro and the product RNA subsequently converted into single-stranded cDNA. The genomic DNA beads were resuspended in hybridization mix containing single-stranded RM cDNA as driver and free polysome polyA+ RNA as competitor to block the hybridization of free polysome cDNA to the beads. The beads were hybridized at 65°C for 16 hrs with rocking. After hybridization the beads were washed extensively and subsequently the hybridized cDNA was eluted and recovered by ethanol precipitation. The protocol used to construct the library is shown schematically in Figure 1. Figure 1
Figure 1 Legend Whole-mount RNA in situ hybridization of Drosophila embryos in 96 well plates The non-radioactive whole embryo in situ hybridization method described by Tautz and Pfeifle (16) was adapted to the use of RNA probes to achieve maximum sensitivity. To allow expedient screening with large numbers of probes, the protocol was further modified for hybridization in 96 well plates. Staging of embryos and description of expression domains was performed as described (17) using a standardized vocabulary (http://flybase.bio.indiana.edu/docs/flydocs/flybase/controlled-vocabularies.txt). Photography and digital imaging Between 10 and 15 individually staged embryos were selected for photography for each RM cDNA clone. Expression domains were examined using Nomarski optics on an Axiophot microscope (Zeiss) and photographed using standard 35mm film. Digital images were generated and written onto compact discs (Eastman Kodak Company). DNA Sequencing and Analysis The cDNAs were sequenced using either the ABI Prism Dye Terminator Cycle Sequencing Ready Reaction Kit or the Pharmacia Autoread Sequencing Kit and the products run on an ABI Prism 373 DNA Sequencer or a Pharmacia ALF Express DNA Sequencer, respectively. The resulting DNA sequences were trimmed and edited using Sequencher 3.1 software. Edited sequences average about 350-400 nucleotides in length and contain 3% or less ambiguity. In cases where sequences from the 5' and 3' ends of the insert overlapped, contigs were constructed. Database searches were carried out using the BLASTN and TBLASTX programs (18). Database and Software We implemented the cDNA database in Illustra version 3.2, an object-oriented relational database. The network browser interface was supported by the Apache v1.2.5 HTTP server. Common Gateway Interface (CGI) scripts were written in Perl v1.0.5. Assemblies of the cDNA sequences are publicly viewable using a Java applet. The applet was compiled with Java 1.0.3 and utilized the BDGP/Neomorphic Software Inc. widget set. The cDNA sequences were analyzed using gapped WU-BLAST v2.0 (Warren Gish). Consensus sequences from multiple cDNAs (tentatively the same gene) were assembled using PHRAP (P. Green, in preparation). RESULTS Isolation of mRNA from rough microsomes Most mRNAs that encode membrane and secreted proteins are bound to the rough endoplasmic reticulum through ribosomes engaged in cotranslational secretion of their nascent polypeptides. We isolated rough endoplasmic reticulum membranes, or rough microsomes (RMs), from embryos as a source of mRNAs encoding membrane and secreted proteins. We found that only a small fraction of polysomal mRNA (<10%) is present in the RM preparation; the vast majority of embryonic mRNA appears to be translated on "free" polysomes encoding cytosolic proteins. This result is consistent with sequencing data obtained from an embryo cDNA library prepared from unfractionated mRNA, which revealed that 94% of clones with matches to known proteins encoded intracellularly-localized proteins (see below). Northern blot analysis was used to determine the extent to which mRNAs encoding membrane and secreted proteins are enriched in the RM RNA preparation (Figure 2A and B). The results show that the mRNA encoding the membrane protein Fasciclin II (Fas II) is approximately 10-fold enriched in the RM RNA preparation relative to the mRNA encoding the cytosolic protein rp 49. Similar results were obtained using probes representing other membrane and cytosolic proteins (data not shown). Although these results confirm that the RM RNA preparation is enriched for mRNAs encoding membrane and secreted proteins, they also reveal that the RM preparation was contaminated with significant amounts of free polysomes. The low yield of RMs obtained from embryos and the RNA degradation suffered on sucrose gradients precluded further purification of the RM preparation. Figure 2
Figure 2 Legend Preparation of a normalized cDNA library Poly A+ RNA was prepared from RM RNA and used to generate a directionally cloned RM cDNA library (Materials and Methods). To increase the chances of identifying genes that encode low abundance mRNAs, it was important to normalize the representation of cDNAs in this library. A method of normalization was needed that would increase the prevalence of rare cDNAs encoding membrane and secreted proteins without increasing the prevalence of cDNAs encoding cytosolic proteins. The normalization procedure we developed is based upon hybridizing a large excess of single stranded cDNA to a limiting amount of genomic DNA that is attached to magnetic beads (Figure 1). To prevent cDNAs encoding cytosolic proteins from hybridizing to the genomic DNA-coated beads, free polysome polyA+ RNA was added as a competitor. Once the hybridization was complete, the unbound cDNA was discarded and the normalized library was prepared from the cDNA that hybridized to the genomic DNA. Thus the representation of cDNAs in the normalized library should reflect gene copy number, rather than mRNA abundance. The effectiveness of this method was determined by colony blot hybridization using probes to a moderately abundant RM-bound mRNA (Fas II), a low abundance RM-bound mRNA (connectin) and a cytosolic mRNA (Ras 1). As expected, normalization had the greatest effect on the frequency of clones representing the low abundance connectin mRNA, which showed a 13-fold increase from an initial frequency of 1 in 90,000 clones to 1 in 6900. By comparison, the frequency of Fas II clones in the normalized library increased only 2-fold from an initial frequency of 1 in 10,000 clones to 1 in 4300. Unexpectedly, the frequency of Ras 1 clones in the library also increased substantially (6-fold from an initial frequency of 1 in 130,000 clones to 1 in 21,000). This suggests that the addition of free polysome RNA as a competitor in the hybridization mix was only partially effective at preventing normalization of cDNAs encoding cytosolic proteins. Given that typical embryo cDNA libraries contain similar numbers of Fas II and Ras 1 clones (data not shown), the results suggest that the normalized RM cDNA library is approximately 5-fold enriched for clones encoding membrane and secreted proteins. Since normalization of the RM library resulted in an increase in the representation of cDNAs encoding cytosolic proteins, we devised a rapid Northern blot assay to determine if a cDNA of interest is likely to encode a membrane or secreted protein or a cytosolic protein (Figure 2C and D). Specifically, the cDNA is hybridized to a blot containing one lane of unfractionated mRNA and one lane of free polysome mRNA: if the hybridization signal is decreased in the free polysome lane, this suggests that the mRNA was bound to rough microsomes and thus encodes a membrane or secreted protein. To date, this assay has produced accurate predictions for 11/12 cDNAs tested (data not shown). RNA in situ hybridization of cDNA clones to Drosophila embryos. Spatial and temporal embryonic expression profiles of the genes represented by RM cDNAs were determined by RNA in situ hybridization to whole mount Drosophila embryos. To evaluate large numbers of cDNA probes, we developed an RNA in situ hybridization protocol that allows the simultaneous screening of 96 different RNA probes in a single multi-well plate. A total of 2518 RNA probes prepared from individual, randomly picked cDNA clones were screened on 0 to 24 hours old, whole mount embryos. Of these clones, 917 (36%) were expressed in specific patterns during embryogenesis, while 1206 (48%) of the cDNAs showed apparent uniform expression throughout the embryo. The remaining 395 clones (16%) did not produce detectable levels of staining in the embryo. For every cDNA clone with specific expression patterns, 10 to 15 embryos covering a range of different embryological stages (starting at the fertilized egg to stage 16) were evaluated and photographed. As expected, a wide variety of temporal and spatial expression patterns was observed (examples in Figure 3). Figure 3
Figure 3 Legend The frequency with which cDNAs were found to be expressed in various embryonic organs is summarized in Table I (ubiquitously expressed cDNAs are not included). The numbers shown in Table I are adjusted for multiple occurrences of cDNAs representing a single gene. A disproportionately large number of cDNAs are expressed in the embryonic gut, the CNS and the muscle, while only a small percentage of cDNAs are found in tissues such as the amnioserosa, glands, trachea, imaginal discs and gonads. A possible explanation for this observation is that expression in a tissue such as the gut is more easily scored than, for example, that in the embryonic imaginal discs; these consist of only 10-25 cells and are considerably more difficult to identify. Only a small percentage of the clones were found to be expressed during early zygotic stages of development (blastoderm, gastrula and segmented germband stages). The vast majority are expressed during stages when the internal organs, like the gut, the central nervous system and the muscles are formed. As the embryos that were used to make the cDNA library were taken from an 8 to 16 hours collection, the period when these tissues are developing, the bias towards cDNAs expressed in the internal organs is not unexpected. In addition, a large number of cDNAs show hybridization to early stage embryos prior to the onset of zygotic gene expression. This hybridization presumably represents maternal contribution of the cognate mRNAs. Table I Expression domains of RM clones during embryogenesis
Table I: Expression domains of RM clones during embryogenesis | ||
Spatial Expression Domain | Number of RM clones\* | %**¹** |
fertilized egg | 167 (282) | 7 |
blastoderm | 13 (18) | <1 |
gastrula | 9 (9) | <1 |
segmented germ band | 4 (5) | <1 |
epidermis | 86 (134) | 4 |
mesoderm | 379 (638) | 16 |
- **somatic mesoderm** | 87 (160) | 4 |
- **visceral mesoderm** | 228 (329) | 9 |
- **head mesoderm** | 28 (84) | 1 |
- **muscle** | 36 (65) | 2 |
nervous system | 210 (317) | 9 |
- **stomatogastric nervous system** | 6 (8) | <1 |
- **peripheral nervous system** | 13 (27) | <1 |
- **central nervous system** | 191 (282) | 8 |
embryonic gut | 418 (642) | 17 |
- **foregut** | 99 (129) | 4 |
- **midgut** | 169 (284) | 7 |
- **hindgut** | 94 (136) | 4 |
- **malpighian tubule** | 38 (72) | 2 |
- **gastric caecum** | 18 (21) | <1 |
amnioserosa | 28 (41) | 1 |
embryonic glands | 69 (95) | 3 |
embryonic tracheal system | 25 (32) | 1 |
reproductive system | 24 (43) | 1 |
imaginal disc | 3 (6) | <1 |
These clone-gene combinations show TBLASTX values between e-18 and e-59. For each mammalian gene, the GenBank accession number is shown in parentheses. Data Availability over the Internet A database describing the expression patterns and DNA sequences of the cDNAs compiled in this study that were expressed in specific tissues is accessible at http://www.fruitfly.org. The web page describing each EST shows the sequence, accession numbers, and a summary of gene expression data, together with a low resolution expression image and a summary of similarity to other sequences. A high resolution digital image is available for downloading. Several types of searches are available to query this information: 1) Expression Domain Keyword Search: Every expression image has been annotated using the standardized set of terms developed by Flybase for the description of Drosophila anatomy (http://flybase.bio.indiana.edu). Therefore, keyword searches for cDNAs that are expressed in a particular embryonic organ, or combination of organs, may be performed; 2) Sequence Keyword Search: A BLAST similarity search was performed on each EST and the results stored in the database, including the accession number of the GenBank entries of similar sequences. cDNAs that show similarity to a particular class of gene may be found by searching for words or phrases that are likely to be found in the gene's GenBank description; 3) Clone Identifier Search: unique identifiers, such as the clone name (CK number) or accession number, can be used to retrieve an individual cDNA record; 4) Sequence Similarity Search: Using a public BLAST server available at the same site as the database, searches for ESTs similar to any query sequence can be performed. DISCUSSION We have used high-throughput whole embryo in situ hybridization and a normalized cDNA library prepared from RM-bound mRNA to identify membrane and secreted proteins whose expression is associated with specific developmental processes during embryogenesis. The expression patterns of 1003 individual cDNAs and sequence information for 1298 cDNAs is available on a public database (http://www.fruitfly.org). This database makes it possible to rapidly identify new developmentally regulated genes and, based on the sequence and expression pattern, formulate testable hypotheses for the function of the genes. For example, based on a motoneuron-specific expression pattern in the developing nerve cord, we identified the first Drosophila member of the tetraspanin family of transmembrane proteins, late bloomer (19). Through subsequent genetic analysis, we determined that late bloomer function facilitates neuromuscular synapse formation in the embryo (19). Similarly, characterization of a cDNA expressed specifically in muscle led to the identification of a new Drosophila glutamate receptor (20). Although the RM cDNA library is 4 - 5 fold enriched for membrane and secreted proteins, this library also contains a large fraction of cDNAs encoding cytosolic and nuclear proteins. This is due in part to the fact that embryonic mRNAs encoding membrane and secreted proteins appear to be much less abundant than mRNAs encoding cytosolic and nuclear proteins. In addition, normalization of the RM library decreased the enrichment for membrane and secreted proteins by partially restoring the prevalence of clones encoding cytosolic and nuclear proteins. In spite of this drawback to normalization, we chose to screen the normalized RM cDNA library to reduce the number of recurrent cDNAs and thereby increase the chances of identifying less abundant mRNAs whose expression is limited to a small number of cells in the embryo. The normalization method we describe has both advantages and disadvantages relative to the more standard methods of normalizing by limited cDNA self-hybridization (21). The main advantage of normalizing by hybridization to genomic DNA is that the method requires no optimization of hybridization times or titration of hydroxyapatite elution conditions. However, genomic DNA hybridization normalizes on the basis of gene copy number, which means that high copy number genes are overrepresented in the cDNA library. We found mitochondrial genes were particularly problematic; approximately 15% of the clones in the library represent mitochondrial genes. This could be resolved by further purification of the genomic DNA to ensure that mitochodrial DNA is not present on the magnetic beads. Another limitation of the technique is the need for relatively large amounts of genomic DNA target in the hybridization to capture enough cDNA to prepare a library. The amount of DNA needed for genomes of higher complexity than Drosophila would necessitate a much larger amount of genomic DNA-coated beads, which would increase the amount of contamination in the library due to nonspecific hybridization. Also, the larger amount of interspersed repetitive DNA in vertebrate genomes would cause rapid annealing of the genomic DNA and could cause vast overrepresentation of mRNAs containing repetitive elements in their untranslated regions. For these reasons, this normalization technique may not be appropriate for vertebrate genomes. Subcellular fractionation of RM-bound mRNA is a convenient way to prepare mRNA enriched for membrane and secreted proteins. However, it requires a relatively large amount of tissue in order to isolate enough mRNA to generate a library that does not require amplification by PCR. It is also difficult to normalize a RM library without increasing the prevalence of mRNAs encoding cytosolic and nuclear proteins. In the course of this work, two alternative methods for identifying cDNAs encoding membrane and secreted proteins were described that have some advantages over subcellular fractionation (22, 23). These methods are based on transforming tissue culture cells (22) or yeast (23) with a vector that will express an assayable reporter protein only when a cDNA encoding a signal sequence is cloned into the vector. This approach allows cDNA libraries to be prepared from small amounts of unfractionated mRNA, and the library of positive cDNAs that is generated is highly specific for membrane and secreted proteins. The Drosophila genome is estimated to contain approximately 12,000 genes (5). The fact that we were able to carry out in situ hybridization to embryos for over 2,500 different cDNA clones in this study argues that the methodology we describe could be used to collect similar data for all Drosophila genes. Suitable probes could be derived by using PCR to amplify segments of sequenced genomic DNA or cDNA clones as templates. The highly sensitive and rapid in situ hybridization method employed here allows the detailed visualization of gene expression and provides a level of spatial and temporal resolution that is not currently obtainable by methods that require RNA isolation and hybridization to clone (24) or oligonucleotide (25) arrays. Such expression data, along with the more quantitative data provided by hybridization to arrays, will be essential for deciphering gene regulatory networks. ACKNOWLEDGMENTS We thank Fred Wolf for his help with the initial RNA in situ screens, Rick Fetter and Lee Fradkin for helping prepare the figures and Lee Fradkin and the members of the Rubin and Goodman laboratories for critical review of the manuscript. C. C. K. was supported as a Jane Coffin Childs postdoctoral fellow and a Howard Hughes Medical Institute (HHMI) postdoctoral associate. T. L. S. is a Jane Coffin Childs postdoctoral fellow. J. N. N. is a postdoctoral associate and C. S. G. and G. M. R. are investigators with the HHMI. This work was supported in part by NIH grant HG00750. REFERENCES
Figure Legends Figure 1 Schematic representation of the cDNA normalization procedure. The normalization method is described in detail in the text. Figure 2 mRNAs encoding transmembrane proteins are selectively enriched in the rough microsome RNA fraction and decreased in the free polysome fraction (A, B) Northern blots containing 20 mg RNA from the total (T) or rough microsome (M) fractions were hybridized with the genes encoding the transmembrane protein Fas II (A) (4500 nucleotide transcript) or the rp 49 ribosomal protein (B) (600 nucleotide transcript). (C, D) Northern blots containing 10 mg polyA+ RNA from the total (T) or free polysome (F) fractions were hybridized with genes encoding the transmembrane protein latebloomer Lbm (C) (1300 nucleotide transcript) or the cytosolic protein actin 57B (D) (2000 nucleotide transcript). Figure 3 Expression domains of a subset of RM clones. The RNA expression patterns of selected RM clones in distinct parts of the Drosophila embryo are shown. A typical image assigned to each RM clone in the database is shown in A, while panels B through L show a detail of these images. In panels B through L, anterior is to the left. (A) Expression of CK02213 in the anterior and posterior midgut primordium (arrows), the midgut (arrowhead) and the visceral mesoderm. This clone shows homology to the human NMDA receptor glutamate-binding subunit. (B) Expression of CK02262 in the ventral nerve cord and brain. This clone shows homology to the B. taurus gene for Na/Ca,K-exchanger protein. (C) Expression of CK02467 in the proventriculus, a part of the stomodeum. This clone does not show homology to any genes in the existing gene databases. (D) Expression of CK01670 in the developing tracheal system. This clone does not show homology to any genes in the existing gene databases. (E) Expression of CK01209 in the brain. This clone shows homology to human serine/threonine kinase. (F) Expression of CK02623 in the salivary glands and proventriculus. This clone shows homology to the rat Na++-dependent inorganic phosphate cotransporter. (G) Expression of CK00246 in the central nervous system, ventral nerve cord and brain. This clone shows homology to mouse and human ESTs. (H) Expression of CK01174 in the reproductive system (gonads). This clone does not show homology to any genes in the existing gene databases. (I) Expression of CK00490 in the anterior and posterior midgut primordium. This clone shows homology to several human ESTs. (J) Expression of CK01593 in the dorsal vessel and lymph gland. This clone does not show homology to any genes in the existing gene databases. (K) Expression of CK02229 in the epidermis, the visceral mesoderm, the tracheal system and the fore and hindgut. This clone shows homology to human laminin. (L) Uniform expression of CK02318 throughout the epidermis. This clone shows homology to a C. elegans EST