GBREL.TXT Genetic Sequence Data Bank 15 June 1990 GenBank(R) Release 64.0 Distribution Tape Release Notes 35100 loci, 42495893 bases, from 43028 reported sequences This document describes the data written on GenBank distribution tapes. The examples used are from the current release. If you have any questions or comments about the data bank, the distribution tape, or this document, please call (415)962- 7364 or write to: GenBank c/o IntelliGenetics Inc. 700 East El Camino Real Mountain View, California 94040 USA The electronic mail address is: GENBANK@GENBANK.BIO.NET 1. INTRODUCTION 1.1 Release 64.0 Release 64.0 has 35,100 loci representing 42,495,893 bases. Release 63.0 had 33,377 loci with 40,127,752 bases. Release 64.0 thus is 5.9% larger in bases than Release 63.0. A statistical summary of Release 64.0 is presented in Appendix A. 1.2 Organization of This Document This introduction notes the changes to GenBank since the last release. The next section describes the contents of the tape files. The third section illustrates the formats of the tape files. The fourth section describes the proposed changes planned for future releases. The last section contains notes about the administration of GenBank. 1.3 Recent Changes in the Data Bank 1.3.1 Changes in This Release 1.3.1.1 International Feature Table Format The EMBL Data Library and GenBank, with the assistance of the DNA Data Bank of Japan, have developed a standard feature table to be implemented by all three data banks. The new feature table is designed to be more understandable and useful. The common feature table will also make software development easier and allow simpler data conversion between data banks. The new feature table will be implemented in Release 64.0 (June 1990). Details of the new format are described in a document entitled: 'The DDBJ/EMBL/GenBank feature table: Definition, Version 1.01, September 10, 1988.' Copies of this document are available on request at the address given on the first page of these release notes. Section 3.5.11 provides further information on the new feature table format. 1.3.1.2 Minor Differences in GenBank Format Beginning in the second quarter of 1990, the GenBank data bank will be maintained in relational format. The data bank will continue to be distributed in the standard flatfile format described in this document. Starting with Release 64.0, the standard flatfile GenBank data bank (generated from the relational data base tables) will contain a few minor format differences from previous releases. These differences are described in the remainder of this section. Information about the relational data base format can be requested by calling (505) 665-2177 or by writing to: GenBank Group T-10, Mail Stop K710 Los Alamos National Laboratories Los Alamos, NM 87545 1.3.1.2.1 Keyword Order The order of keyword phrases on the KEYWORDS line will be alphabetical. 1.3.1.2.2 Accession Number Order The order of non-primary accesion numbers on the ACCESSION line is not necessarily preserved from one release to the next. 1.3.1.2.3 Reference Order The order of the references in an entry is not necessarily preserved from one release to the next. 1.3.1.2.4 Changes in Taxonomy The taxonomic classification of many organisms will be changed to ensure uniformity throughout the data bank. A few organisms will be listed as unclassified as a result of inconsistencies in the data. The annotation staff is addressing these inconsistencies. 1.3.1.2.5 Inconsistencies in Reference Information Inconsistencies in reference information will not be preserved. All occurences of a single reference will be identical, usually matching the first occurrence in the data bank. 1.3.1.2.6 Formatting Changes Some text fields (for example, definition, comment, title, etc.) will, in some cases, be reformatted slightly. In most cases, the actual text will remain unchanged. 1.3.1.2.7 Comment Field Formatting Any special formatting in the comment field may not be preserved. This will be corrected in future releases. 1.3.1.2.8 Reference Line Changes The reference lines for sites and review papers will be slightly modified. All of these papers will be classified as 'sites', with the original text describing the citation appearing in the comment field. 1.3.1.3 Data from EMBL and the DNA Data Bank of Japan New sequence data from EMBL Release 21 have been incorporated into this release of GenBank. Sequence data from Release 6.0 of the DNA Data Bank of Japan (DDBJ) have also been incorporated into this release. Entries with accession numbers beginning with the letter 'D' have been created and annotated by DDBJ. Release 64 contains approximately 408 entries from DDBJ. 1.3.1.4 Changes in Locus Names Several locus names have been changed to make the application of the organism codes consistent throughout the data bank. These changes are listed in Appendix B. 1.3.2 Changes in Earlier Releases The following changes in GenBank format were installed in previous releases and described in their Release Notes. These changes are described here again for those users who may not have received those releases. 1.3.2.1 VAX/VMS Backup Saveset Files Are Compressed (Release 62) Starting with Release 62.0, the GenBank distribution files in the VAX/VMS Backup Saveset will be compressed before writing them to tape. This will save approximately 65% in magnetic media required for these distributions, saving you money and enabling us to duplicate the tapes more rapidly. The saveset includes a public domain executable image for the VMS Uncompress utility (DCOMPRESS.EXE) and a command procedure (DECMPRESS.COM) to uncompress all the distribution files and remove the compressed versions of those files. When all of the files are uncompressed, the final total amount of disk space required will be the same as if you had loaded a tape containing the uncompressed files. During the uncompression phase some additional disk space is required. For example, the files in Release 61 required approximately 189,000 disk blocks; the uncompression phase required an additional 2,450 blocks. These blocks are freed after installation. Section 2.3 contains instructions for uncompressing files. 2. ORGANIZATION OF TAPE FILES 2.1 Tape Formats The GenBank data bank is now available in three formats on three different physical media (see Section 5.4 for further details on which formats are available on each medium). GenBank is available on 9-track, unlabelled, industry-standard, ASCII magnetic tapes. These tapes have been written in fixed-length records of 80 characters, each with no carriage-return or line-feed characters. Each record corresponds to one line in the data bank; trailing blanks have been added to the lines to make them all exactly 80 characters long. (A completely blank line is therefore represented by 80 blanks.) The label affixed to the tape reel indicates its block size and density. If no specifications are received from you, the tape is written with a fixed block size of 160 records (12,800 characters) and a density of 6250 bpi (bits per inch). We also offer tapes written at a density of 1600 bpi and a block size of 40 records (3200 characters). GenBank is also available as a VAX/VMS Backup saveset (on 9-track tapes or TK-50 cartridges) or as compressed Unix tar archives (on 9 track tapes and Sun 1/4" QIC 24 format tape cartridges). The data on the tapes have both uppercase and lowercase characters. Upon special request, the unlabelled, 9 track tapes can be written using uppercase characters only (Section 6.4 specifies which formats are available in uppercase only). 2.2 Files GenBank consists of twenty-two files in all magnetic tape distributions. The list which follows describes each of the files included in the distribution. In the following sections there are additional lists indicating the breakdown of files on the various media and formats. 2.2.1 File Descriptions 1. GBREL.TXT - Release notes (this document). 2. GBSDR.TXT - Short directory of the data bank. 3. GBNEW.TXT - List of new or substantially revised entries. 4. GBACC.IDX - Index of the entries according to accession number. 5. GBKEY.IDX - Index of the entries according to keyword phrase. 6. GBAUT.IDX - Index of the entries according to author. 7. GBJOU.IDX - Index of the entries according to journal citation. 8. GBHGM.IDX - Index of the entries according to gene symbol. 9. GBDAT.FRM - Forms for submitting sequences or corrections to GenBank. 10. GBPRI.SEQ - Primate sequence entries. 11. GBROD.SEQ - Rodent sequence entries. 12. GBMAM.SEQ - Other mammalian sequence entries. 13. GBVRT.SEQ - Other vertebrate sequence entries. 14. GBINV.SEQ - Invertebrate sequence entries. 15. GBPLN.SEQ - Plant sequence entries (including fungi and algae). 16. GBORG.SEQ - Eukaryotic organelle sequence entries. 17. GBBCT.SEQ - Bacterial sequence entries. 18. GBRNA.SEQ - Structural RNA sequence entries. 19. GBVRL.SEQ - Viral sequence entries. 20. GBPHG.SEQ - Phage sequence entries. 21. GBSYN.SEQ - Synthetic and chimeric sequence entries. 22. GBUNA.SEQ - Unannotated sequence entries. 2.2.2 Fixed Length Records Approximately 162 MB of disk space is required for the Release 64.0 files in fixed-length record format. All the files fit on two 6250 bpi tapes and are divided between the tapes as follows. Tape 1 Tape 2 GBREL.TXT GBVRL.SEQ GBSDR.TXT GBPHG.SEQ GBNEW.TXT GBSYN.SEQ GBACC.IDX GBUNA.SEQ GBKEY.IDX GBAUT.IDX GBJOU.IDX GBHGM.IDX GBDAT.FRM GBPRI.SEQ GBROD.SEQ GBMAM.SEQ GBVRT.SEQ GBINV.SEQ GBPLN.SEQ GBORG.SEQ GBBCT.SEQ GBRNA.SEQ At 1600 bpi, six tapes are required and the files are divided among the tapes as follows: Tape 1 Tape 4 GBREL.TXT GBINV.SEQ GBSDR.TXT GBORG.SEQ GBNEW.TXT GBBCT.SEQ GBACC.IDX GBKEY.IDX GBAUT.IDX Tape 5 GBJOU.IDX GBHGM.IDX GBPLN.SEQ GBDAT.FRM GBRNA.SEQ GBUNA.SEQ GBPHG.SEQ Tape 2 Tape 6 GBPRI.SEQ GBVRL.SEQ GBMAM.SEQ GBSYN.SEQ Tape 3 GBROD.SEQ GBVRT.SEQ 2.2.3 VAX/VMS Backup Saveset Saveset files are in directory order rather than in the order shown for the formats above. The files are in compressed format (See Section 1.3.1.2 for details). Approximately 116 MB of disk space is required for Release 64.0 files in VAX/VMS Backup Saveset format. The files archived in the Backup Saveset use variable-length records, not the 80-character fixed- length records described above. All files fit on one 6250 bpi tape. At 1600 bpi, two tapes are required, and the files are divided between the tapes as follows: Tape 1 Tape 2 AAAREADME.TXT GBVRL_SEQ.Z DCOMPRESS.CLD GBVRT_SEQ.Z DCOMPRESS.EXE DECMPRESS.COM GBACC_IDX.Z GBAUT_IDX.Z GBBCT_SEQ.Z GBDAT_FRM.Z GBHGM_IDX.Z GBINV_SEQ.Z GBJOU_IDX.Z GBKEY_IDX.Z GBMAM_SEQ.Z GBNEW_TXT.Z GBORG_SEQ.Z GBPHG_SEQ.Z GBPLN_SEQ.Z GBPRI_SEQ.Z GBREL_TXT.Z GBRNA_SEQ.Z GBROD_SEQ.Z GBSDR_TXT.Z GBSYN_SEQ.Z GBUNA_SEQ.Z NOTE: When the files are uncompressed (as instructed in Section 2.3) the '.Z' will be removed from the end of the file name and the characters after the underscore will become the file extension. For example, 'GBACC_IDX.Z' will be named 'GBACC.IDX'. One TK-50 cartridge is required; the files are in directory order and are compressed as described above. 2.2.4 Unix tar Format The files are compressed with the Unix compress utility before the tar command is executed; they must therefore be uncompressed before use (see Section 2.4 below for details). Approximately 38 MB of disk space is required for the Release 64.0 files when in the compressed format; the uncompressed files require approximately 116 MB. All files fit on one 6250 bpi tape, 1600 bpi tape, or Sun cartridge. The files are in directory order rather than in the order shown for the fixed-length record formats. In addition, the file names are in lowercase letters. The tar file uses variable length records; the records are not padded to 80 characters with space characters. To get fixed- length, 80-character records, first uncompress the .Z files. Then use dd with the conv=block and cbs=80 options set to filter the file. If you pad the records, it adds approximately 46 MB of disk space. Unix Tar File Order: gbacc.idx.Z gbaut.idx.Z gbbct.seq.Z gbdat.frm.Z gbhgm.idx.Z gbinv.seq.Z gbjou.idx.Z gbkey.idx.Z gbmam.seq.Z gbnew.txt.Z gborg.seq.Z gbphg.seq.Z gbpln.seq.Z gbpri.seq.Z gbrel.txt.Z gbrna.seq.Z gbrod.seq.Z gbsdr.txt.Z gbsyn.seq.Z gbuna.seq.Z gbvrl.seq.Z gbvrt.seq.Z NOTE: When the files are uncompressed the '.Z' extension is removed from the file names. 2.2.5 File Sizes The following table indicates the approximate sizes of the individual files in this release. Since minor changes to some of the files may occur after the release notes are printed, these sizes should not be used to determine file integrity. They are provided as an aid to planning only. The columns in the table have the following meanings: (1) - Sizes (in bytes) of the fixed-length record files (described in Section 2.2.2) (2) - Sizes (in bytes) of the compressed files included in the Unix tarfile (Section 2.2.4) (3) - Sizes (in bytes) of the files in the Unix tarfile after uncompression (Section 2.2.4) (4) - Sizes (in blocks) of the compressed files included in the VMS Backup saveset (Section 2.2.3 and 2.4) (5) - Sizes (in blocks) of the files in the VMS Backup saveset after decompression (Sections 2.2.3 and 2.3) File (1) (2) (3) (4) (5) GBACC.IDX 3102000 475791 1407199 939 2825 GBAUT.IDX 8564880 1691029 4784035 3462 9591 GBBCT.SEQ 16225360 4067267 12241265 8549 24585 GBDAT.FRM 30240 7194 17518 19 43 GBHGM.IDX 93360 21096 60634 44 121 GBINV.SEQ 10865600 2582469 7996733 5452 16069 GBJOU.IDX 3905360 641187 2123574 1319 4264 GBKEY.IDX 3147600 742585 2152089 1503 4291 GBMAM.SEQ 5267600 1240711 3819640 2613 7676 GBNEW.TXT 728800 102363 443471 221 893 GBORG.SEQ 4920640 1195380 3696115 2514 7425 GBPHG.SEQ 2072080 492669 1480348 1051 2977 GBPLN.SEQ 10868880 2681845 8170950 5644 16411 GBPRI.SEQ 26923920 6246011 19464061 13178 39100 GBREL.TXT 353840 90749 276859 192 557 GBRNA.SEQ 3944160 688692 2367750 1513 4774 GBROD.SEQ 24304480 5476559 17291154 11579 34757 GBSDR.TXT 2812880 920822 2808799 1870 5624 GBSYN.SEQ 2351120 500435 1526006 1073 3071 GBUNA.SEQ 9638800 2385611 7368550 4953 14804 GBVRL.SEQ 15268400 3775449 11589403 7917 23278 GBVRT.SEQ 6137440 1437604 4419642 3045 8884 AAAREADME.TXT 2 2 DCOMPRESS.CLD 4 4 DCOMPRESS.EXE 150 150 DECMPRESS.COM 2 2 Totals 161527440 37463518 115505795 78808 232178 2.3 Loading Data Bank Files in VAX/VMS Backup Format In order to use the VAX/VMS Backup Saveset format, you must be running release 5.0 or greater of the VMS operating system. If you are not running release 5.0 or greater, you should order the unlabelled ASCII format instead of VAX/VMS Backup. The following command should be used to load the saveset into the current directory on your disk: BACKUP/LOG MSA0:GENBANK [] (NOTE: Replace 'MSA0' with the identifier for your disk.) The following command should be used to uncompress the files. NOTE: If you do not want to keep all of the files, delete those you do not want before you run the uncompress procedure. The uncompress routine works on all the files in the directory that have a '.Z' extension. @DECMPRESS The following commands were used to create the VAX/VMS Backup Saveset. NOTE: The '...' indicates that the following line is a continuation and should be typed without a break. For 6250 bpi tape: BACKUP/DENSITY=6250/BUFFER=5/VERIFY/INTERCHANGE/... LIST=GB.LST GB1:[GENBANK.PROD]GB*.* TAPE:GENBANK For 1600 bpi tape: BACKUP/DENSITY=1600/BUFFER=5/VERIFY/INTERCHANGE/... LIST=GB.LST GB1:[GENBANK.PROD]GB*.* TAPE:GENBANK For TK-50 cartridge: BACKUP/BUFFER=5/VERIFY/INTERCHANGE/... LIST=GB.LST GB1:[GENBANK.PROD]GB*.* TAPE:GENBANK 2.4 Loading Data Bank Files in Unix tar Format The following commands should be used to load the Unix tar files into the current directory on your disk: tar xvfb /dev/rmt8 126 gb*.Z uncompress gb*.Z (NOTE: Replace 'rmt8' with the identifier for your device.) The following command was used to write the tarfile on the distribution tape: For 6250 and 1600 bpi tapes: tar cvfb /dev/rmt8 20 gb*.Z For Sun cartridge: tar cvfb /dev/rst8 126 gb*.Z 3. FILE FORMATS 3.1 File Header Information Each of the twenty-two files on the distribution tape begins with the same header, except for the first line, which contains the file name, and the sixth line, which contains the title of the file. The first line of the file contains the file name in character positions 1 to 9 and the full data bank name (Genetic Sequence Data Bank) starting in column 20. The brief names of the files in this release are listed in section 2.2. The second line contains the date of the current release in the form 'day month year', beginning in position 26. The fourth line contains the current GenBank release number. The release number appears in positions 41 to 45 and consists of two numbers separated by a decimal point. The number to the left of the decimal is the major release number. The digit to the right of the decimal indicates the version of the major release; it is zero for the first version. The sixth line contains a title for the file. The eighth line lists the number of entries (loci), number of bases (or base pairs), and number of reports of sequences in this release of GenBank. These numbers are right-justified at fixed positions. The number of entries appears in positions 1 to 7, the number of bases in positions 15 to 22, and the number of reports in positions 36 to 40. (There are more reports of sequences than entries since reported sequences that overlap or duplicate each other are combined into single entries.) The third, fifth, seventh, and ninth lines are blank. 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- GBACC.IDX Genetic Sequence Data Bank 15 June 1990 GenBank(R) Release 64.0 Accession Number Index 35100 loci, 42495893 bases, from 43028 reported sequences ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 1. Sample File Header 3.2 Directory Files 3.2.1 Short Directory File The short directory file contains brief descriptions of all of the sequence entries contained in this release. These descriptions are in thirteen groups, one group for each of the thirteen sequence entry data files. The first record at the beginning of a group of entries contains the name of the group in uppercase characters, beginning in position 21. The organism groups are PRIMATE, RODENT, OTHER MAMMAL, OTHER VERTEBRATE, INVERTEBRATE, PLANT, ORGANELLE, BACTERIAL, STRUCTURAL RNA, VIRAL, PHAGE, SYNTHETIC, or UNANNOTATED. The second record is blank. Each record in the short directory contains the sequence entry name (LOCUS) in the first 12 positions, followed by a brief definition of the sequence beginning in column 13. The definition is truncated (at the end of a word) to leave room at the right margin for at least one space, the sequence length, and the letters 'bp'. The length of the sequence is printed right- justified to column 77, followed by the letters 'bp' in columns 78 and 79. The next-to-last record for a group has 'ZZZZZZZZZZ' in its first ten positions (where the entry name would normally appear). The last record is a blank line. An example of the short directory file format, showing the descriptions of the last entries in the Other Vertebrate sequence data file and the first entries of the Invertebrate sequence data file, is reproduced below: 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- ZEFHOX21 Zebrafish Hox-2.1 gene homologue (ZF-21). 291bp ZEFRZF21 Zebrafish mRNA for homeotic protein ZF-21. 2073bp ZEFZF54 Zebrafish homeotic gene ZF-54. 246bp ZEFZFEN Zebrafish engrailed-like homeobox sequence. 327bp ZZZZZZZZZZ INVERTEBRATE ACAACTI Amoeba (A. castellanii) actin gene-i. 1571bp ACAMHCA A.castellanii non-muscle myosin heavy chain gene, partial 5894bp ACAMYHCIB Myosin IB heavy chain gene, complete cds. 6504bp ACAMYOIHC A.castellanii myosin I heavy chain (MIL) gene, complete cds. 5705bp ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 2. Short Directory File 3.2.2 New and Updated Entry File The directory of new and updated entries is a list of those entries that have been newly added or that have undergone substantive revision in this release. These entries are listed in the same order in which they appear in the actual data files; they are divided into thirteen groups, one group for each of the thirteen sequence entry data files. The first record at the beginning of a group of entries designates that group, beginning in position 21. The second record is blank and the third record has asterisks in its first ten positions. Within each group, the entries are listed alphabetically. For each entry, the new and updated entry file gives the information included under the LOCUS and DEFINITION keywords in the same format in which they appear in the actual sequence entry; these categories are described in section 3.5.2. After the last record of an entry comes a record containing asterisks in its first ten positions. At the end of each group, a dummy entry contains only a LOCUS line with the entry name 'ZZZZZZZZZZ'. Therefore, the next-to-last record has ten asterisks in its first ten positions; the last record of the group is blank. The following excerpt from the current release shows the last new or revised entry from the Other Vertebrate sequence data file, followed by the first new or revised entry from the Invertebrate sequence data file: 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- ********** LOCUS XETUG6 977 bp ds-DNA VRT 15-JUN-1990 DEFINITION X.tropicalis U6 uRNA gene. ********** LOCUS ZZZZZZZZZZ ********** INVERTEBRATE ********** LOCUS AMFHOMA 258 bp ds-DNA INV 15-JUN-1990 DEFINITION Bee homeobox-containing gene, partial cds, clone H55. ********** ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 3. New and Updated Entry File 3.3 Index Files There are five files containing indices to the entries in this release: - Accession number index file - Keyword phrase index file - Author name index file - Journal citation index file - Gene symbol index file The accession numbers, keywords, authors, journals, or gene symbols (the index keys) of an index are sorted alphabetically. (The index keys for the keyword phrases and author names appear in uppercase characters even though they appear in mixed case in the sequence entries.) Under each index key, the names of the sequence entries containing that index key are listed alphabetically. Each sequence name is also followed by its data file division and primary accession number. The following codes are used to designate the data file divisions: 1. PRI - primates 2. ROD - rodents 3. MAM - other mammals 4. VRT - other vertebrates 5. INV - invertebrates 6. PLN - plants, fungi, and algae 7. ORG - organelles 8. BCT - bacteria 9. RNA - structural RNAs 10. VRL - viruses 11. PHG - bacteriophage 12. SYN - synthetic sequences 13. UNA - unannotated sequences The index key begins in column 1 of a record. An 11-character field for the sequence entry name starts in position 14 of a record, followed by a 3-character field for the data file division, starting at position 25 and ending at position 27, and a 6-character field for the primary accession number, starting at position 29 and ending at position 34. All entries in the fields are left-justified. Beginning at positions 36 and 58, the three fields repeat, so three sets of sequence information can appear in one record. If there are more than three entry names, the next records are used; the index key is not repeated. For the accession number and human gene symbol index files, the entry names begin in the same record as the index key, since the key is always less than 12 characters. In the other index files, the entry names begin on the record following the index key record. 3.3.1 Accession Number Index File Accession numbers consist of a single letter followed by five digits. They provide an unchanging designation for the data with which they are associated, and we encourage you to cite accession numbers whenever you refer to data from the data bank. The primary accession number is the first accession number of an entry. It is unique to that entry. Citation of that number will enable other investigators to locate the data no matter what entry name changes or other data bank reorganizations may occur. The accession numbers, however, carry no intrinsic information about the data. In addition to the primary accession number, some entries have secondary accession numbers. Secondary accession numbers arise for a number of reasons. For example, a single accession number may initially be assigned to the sequence in an article. If it is later discovered that the sequence must be entered into the data bank as multiple entries, each entry would receive a new primary accession number; the previous accession number would appear as the secondary accession number in each entry. The following excerpt from the accession number index file illustrates the format of the index: 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- J00316 HUMTBB11P PRI J00316 J00317 HUMTBB46P PRI J00317 J00318 HUMUG1 PRI J00318 J00319 HUMUG1PA PRI J00319 J00320 HUMVIPMR1 PRI L00154 HUMVIPMR2 PRI L00155 HUMVIPMR3 PRI L00156 HUMVIPMR4 PRI L00157 HUMVIPMR5 PRI L00158 J00321 BABA1AT PRI J00321 J00322 CHPRSA PRI J00322 J00323 AGMRSASPC PRI J00323 J00324 BABATIII PRI J00324 ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 4. Accession Number Index File If the same accession number is found in more than one entry (a result of the infrequent occasions when a single entry is split into two or more separate entries), then the additional entries and groups in which the number appears are also given. 3.3.2 Keyword Phrase Index File Keyword phrases consist of names for gene products and other characteristics of sequence entries. There are approximately 10,170 keyword phrases. An excerpt from the keyword phrase index file is shown below: 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- DNA HELICASE ECOHELIV BCT J04726 ECOUVRD BCT X00738 DNA INVERTASE ECOPIN BCT K00676 ECOPINP BCT K03521 PMUGINMOM PHG V01463 DNA LIGASE ECOLIG BCT M24278 ECOLIGA BCT M30255 PT4G30 PHG X00039 PT7CG PHG J02518 YSCCDC9 PLN X03246 YSPCDC17 PLN X05107 DNA MATURATION HS1CAS VRL M22962 DNA METHYLASE HEHMTS BCT J02677 DNA METHYLATION HEHMTS BCT J02677 HUMSPM1 PRI X06585 HUMSPM2 PRI X06586 HUMSPM3 PRI X06587 HUMSPM4 PRI X06588 HUMSPM5 PRI X07490 HUMSPM6 PRI X07491 HUMSPM7 PRI X07492 HUMSPM8 PRI X07493 HUMSPM9 PRI X07494 DNA NUCLEOTIDYLEXOTRANSFERASE MUSTDTR ROD X04123 DNA PACKAGING P29PRO PHG X05973 ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 5. Keyword Phrase Index File 3.3.3 Author Name Index File The author name index file lists all of the author names that appear in the citations. An excerpt from the author name index file is shown below: 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- LANDSMAN,D. CHKHMG14 VRT M20817 CHKHMG17 VRT Y00416 CHKHMG17A VRT J03229 HUMHMG14 PRI J02621 HUMHMG14A PRI M21339 HUMHMG17 PRI M12623 MUSHMG17 ROD X12944 X06353 UNA X06353 X06444 UNA X06444 X13546 UNA X13546 X13929 UNA X13929 X13930 UNA X13930 LANDSMANN,J. LAMCG PHG J02459 TRTHB PLN Y00296 LANDY,A. ECOLAMATT BCT J01638 ECOP80ATB BCT M10892 ECOTGTUFB BCT J01717 ECOTGY1 BCT K01197 ECOTRY1 RNA K00266 ECOTRY2 RNA K00267 ECOTRY3 RNA M10878 LAMCG PHG J02459 LAMECOGAL PHG M11151 LAMPRCA PHG M12458 LAMPRCB PHG M12459 P22ATTP PHG M10893 P22INT PHG X04052 P80ATTP PHG M10891 STYP22ATB BCT M10894 ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 6. Author Name Index File 3.3.4 Journal Citation Index File The journal citation index file lists all of the citations that appear in the references. All citations are truncated to 80 characters. An excerpt from the citation index file is shown below: 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- (IN) THE CELL NUCLEUS, VOLUME VIII: 261-305; ACADEMIC PRESS, NEW YORK (1981). RATUR5A RNA K00783 (IN) THE LENS: TRANSPARANCY AND CATARACT: 171-179; EURAGE, RIJSWIJK (1986) RANCRYG2A VRT K02264 RANCRYG4A VRT K02266 RANCRYG5A VRT M22529 RANCRYG6A VRT M22530 RANCRYR VRT X00659 (IN) VIRUS RESEARCH. PROCEEDINGS OF 1973 ICN-UCLA SYMPOSIUM: 533-544; ACADEMIC P LAMCG PHG J02459 ABUSE RES. MONOGRAPH SER. 70, 43-65 (1986) M28263 UNA M28263 ACTA BIOCHIM. POL. 24, 301-318 (1977) LUPTRFJ RNA K00345 LUPTRFN RNA K00346 ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 7. Journal Citation Index File 3.3.5 Cross-Reference To Gene Symbol Libraries The gene symbol file contains the gene symbols used in the Howard Hughes Medical Institute Human Gene Mapping Library and other gene symbols, such as those for the E. coli genes. The gene symbols are found in the feature table and have the form: /gene="gene symbol"; an example is found in section 3.5.11.5. An example of the format of the gene symbol index file follows: 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- HP HUMHPA1B PRI K01763 HUMHPA1S PRI X00637 HUMHPA2B PRI N00026 HUMHPA2BR PRI X05209 HUMHPA2R PRI M13908 HUMHPAB PRI K00422 HUMHPABA PRI M13192 HUMHPABX PRI M12387 HPR HUMHPARS2 PRI K03431 HPRT HUMHPRT1A PRI M31642 HUMHPRTA PRI M12452 HPX HUMHXM PRI X02537 HUMHXMA PRI J03048 HRAS HUMHRASA PRI M19990 HRG HUMBHRPA PRI M18372 HUMHRGA PRI M13149 HSDS ECOHSDSK BCT J01632 HSPA1L HUMHSP70 PRI M11236 HUMHSP70A PRI X04676 HUMHSP70B PRI X04677 HUMHSP70D PRI M11717 ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 8. Gene Symbol Index File 3.4 GenBank Data Submission Form and Error/Suggestion Report Form The ninth file on the distribution tape contains a data submission form. Due to the large volume of new sequence data, we encourage authors to complete this form and return it to the address listed on the form. This will enable data to be entered more quickly into the data bank. You can complete the form with any text editor. You can send the completed form to GenBank on tape or floppy diskette, or electronically via INTERNET or BITNET (the electronic mail address is: gb-sub%life@lanl.gov). We can use information saved on any computer medium from any computer system. You can also print the form, fill it in by hand, and send it to the mailing address given at the beginning of the form. The second form in this file is the GenBank Error/Suggestion Report Form. It is separated from the Data Submission Form by a form-feed character (L, ASCII octal value 014, ASCII decimal value 12). We encourage all GenBank users to report any errors to the data bank staff using this form. Like the GenBank Data Submission Form, it may be printed and filled in by hand and sent by mail to the address given at the beginning of the form. It may also be filled out using a text editor and sent to GenBank by electronic mail at the address given at the top of the form. 3.5 Sequence Entry Files The distribution tape contains thirteen sequence entry data files, one for each division of GenBank. Each file contains the entries for one group of organisms. 3.5.1 File Organization Each of these files has the same format and consists of two parts: header information (described in section 3.1) and sequence entries for that division (described in the following sections). 3.5.2 Entry Organization In the second portion of a sequence entry file (containing the sequence entries for that division), each record (line) consists of two parts. The first part is found in positions 1 to 10 and may contain: 1. A keyword, beginning in column 1 of the record (e.g., REFERENCE is a keyword). 2. A subkeyword beginning in column 3, with columns 1 and 2 blank (e.g., AUTHORS is a subkeyword of REFERENCE). 3. Blank characters, indicating that this record is a continuation of the information under the keyword or subkeyword above it. 4. A code, beginning in column 5, indicating the nature of an entry (feature key) in the FEATURES table; these codes are described in Section 3.5.11.1 below. 5. A number, ending in column 9 of the record. This number occurs in the portion of the entry describing the actual nucleotide sequence and designates the numbering of sequence positions. 6. Two slashes (//) in positions 1 and 2, marking the end of an entry. The second part of each sequence entry record contains the information appropriate to its keyword, in positions 13 to 80 for keywords and positions 11 to 80 for the sequence. The following is a brief description of each entry field. Detailed information about each field may be found in Sections 3.5.4 to 3.5.13. LOCUS A short unique name for the entry, chosen to suggest the sequence's definition. Mandatory keyword/exactly one record. DEFINITION A concise description of the sequence. Mandatory keyword/one or more records. ACCESSION The primary accession number is a unique, unchanging code assigned to each entry. (Please use this code when citing information from GenBank.) Mandatory keyword/one or more records. KEYWORDS Short phrases describing gene products and other information about an entry. Mandatory keyword in all annotated entries/one or more records. SEGMENT Information on the order in which this entry appears in a series of discontinuous sequences from the same molecule. Optional keyword (only in segmented entries)/exactly one record. SOURCE Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword. ORGANISM Formal scientific name of the organism (first line) and taxonomic classification levels (second and subsequent lines). Mandatory subkeyword in all annotated entries/two or more records. REFERENCE Citations for all articles containing data reported in this entry. Includes four subkeywords and may repeat. Mandatory keyword/one or more records. AUTHORS Lists the authors of the citation. Mandatory subkeyword/one or more records. TITLE Full title of citation. Optional subkeyword (present in all but unpublished citations)/one or more records. JOURNAL Lists the journal name, volume, year, and page numbers of the citation. Mandatory subkeyword/one or more records. STANDARD Lists information about the degree to which the entry has been annotated and the level of review to which it has been subjected. Mandatory subkeyword/exactly one record. COMMENT Cross-references to other sequence entries, comparisons to other collections, notes of changes in LOCUS names, and other remarks. Optional keyword/one or more records/may include blank records. FEATURES Table containing information on portions of the sequence that code for proteins and RNA molecules and information on experimentally determined sites of biological significance. Optional keyword/one or more records. BASE COUNT Summary of the number of occurrences of each base code in the sequence. Mandatory keyword/ exactly one record. ORIGIN Specification of how the first base of the reported sequence is operationally located within the genome. Where possible, this includes its location within a larger genetic map. Mandatory keyword/ exactly one record. The ORIGIN line is followed by sequence data (multiple records). // Entry termination symbol. Mandatory at the end of an entry/exactly one record. 3.5.3 Sample Sequence Data File An example of a complete sequence entry file follows. (This example has only two entries.) Note that in this example, as throughout the data bank, numbers in square brackets indicate items in the REFERENCE list. For example, in ACARR58S, [1] refers to the paper by Mackay, et al. 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- GBSMP.SEQ Genetic Sequence Data Bank 15 June 1990 GenBank(R) Release 64.0 Structural Rna Sequences 2 loci, 280 bases, from 2 reported sequences LOCUS AAURRA 118 bp ss-rRNA RNA 16-JUN-1986 DEFINITION A.auricula-judae (mushroom) 5S ribosomal RNA. ACCESSION K03160 KEYWORDS 5S ribosomal RNA; ribosomal RNA. SOURCE A.auricula-judae (mushroom) ribosomal RNA. ORGANISM Auricularia auricula-judae Eukaryota; Plantae; Thallobionta; Basidiomycotina; Phragmobasidiomycetes; Heterobasidiomycetidae; Eutremellales; Auriculariaceae; Auricularia; auricula-judae. REFERENCE 1 (bases 1 to 118) AUTHORS Huysmans,E., Dams,E., Vandenberghe,A. and De Wachter,R. TITLE The nucleotide sequences of the 5S rRNAs of four mushrooms and their use in studying the phylogenetic position of basidiomycetes among the eukaryotes JOURNAL Nucleic Acids Res. 11, 2871-2880 (1983) STANDARD full staff_review FEATURES Location/Qualifiers rRNA 1..118 /note="5S ribosomal RNA" BASE COUNT 27 a 34 c 34 g 23 t ORIGIN 5' end of mature rRNA. 1 atccacggcc ataggactct gaaagcactg catcccgtcc gatctgcaaa gttaaccaga 61 gtaccgccca gttagtacca cggtggggga ccacgcggga atcctgggtg ctgtggtt // LOCUS ACARR58S 162 bp ss-rRNA RNA 15-MAR-1989 DEFINITION A.castellanii (amoeba) 5.8S ribosomal RNA. ACCESSION K00471 KEYWORDS 5.8S ribosomal RNA; ribosomal RNA. SOURCE A.castellani (amoeba; strain ATCC 30010) rRNA. ORGANISM Acanthamoeba castellanii Eukaryota; Animalia; Protozoa; Sarcomastigophora; Sarcodina; Rhizopoda; Lobosa; Gymnamoeba; Amoebida; Acanthopodina; Acanthamoebidae; Acanthamoeba; castellanii. REFERENCE 1 (bases 1 to 162) AUTHORS Mackay,R.M. and Doolittle,W.F. TITLE Nucleotide sequences of AcanthamoebA.castellanii 5S and 5.8S ribosomal ribonucleic acids: Phylogenetic and comparative structural analyses JOURNAL Nucleic Acids Res. 9, 3321-3334 (1981) STANDARD simple staff_review COMMENT [1] also sequenced A.castellanii 5S rRNA . FEATURES Location/Qualifiers rRNA 1..162 /note="5.8S rRNA" BASE COUNT 40 a 39 c 44 g 39 t ORIGIN 5' end of mature rRNA. 1 aactcctaac aacggatatc ttggttctcg cgaggatgaa gaacgcagcg aaatgcgata 61 cgtagtgtga atcgcaggga tcagtgaatc atcgaatctt tgaacgcaag ttgcgctctc 121 gtggtttaac cccccgggag cacgttcgct tgagtgccgc tt // ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 9. Sample Sequence Data File 3.5.4 LOCUS Format The pieces of information contained in the LOCUS record are always found in fixed positions. The locus name (or entry name), which is always ten characters or less, begins in position 13. The locus name is designed to help group entries with similar sequences: the first three characters usually designate the organism; the fourth and fifth characters can be used to show other group designations, such as gene product; for segmented entries the last character is one of a series of sequential integers. The number of bases or base pairs in the sequence ends in position 29. The letters 'bp' are in positions 31 to 32. Positions 34 to 36 give the number of strands of the sequence. Positions 37 to 40 give the topology of molecule sequenced. If the sequence is of a special type, a notation (such as 'circular' or 'tandem') is included in positions 43 to 52. GenBank sequence entries are divided among thirteen taxonomic divisions. Each entry's division is identified by a three- letter code in positions 53 to 55. See Section 3.3 for the division codes. Positions 63 to 73 of the record contain the date the entry was entered or underwent any substantial revisions, such as the addition of newly published data, in the form dd-MMM-yyyy. The detailed format for the LOCUS record is as follows: Positions Contents 1-12 LOCUS 13-22 Locus name 23-29 Length of sequence, right-justified 31-32 bp 34-36 Blank, ss- (single-stranded), ds- (double-stranded), or ms- (mixed-stranded) 37-40 Blank, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA), mRNA (messenger RNA), or uRNA (small nuclear RNA) 43-52 Blank (implies linear), circular, or tandem 53-55 The division code (see Section 3.3) 63-73 Date, in the form dd-MMM-yyyy (e.g., 15-JUN-1989) 3.5.5 DEFINITION Format The DEFINITION record gives a brief description of the sequence, proceeding from general to specific. It starts with the common name of the source organism, then gives the criteria by which this sequence is distinguished from the remainder of the source genome, such as the gene name and what it codes for, or the protein name and mRNA, or some description of the sequence's function (if the sequence is non-coding). If the sequence has a coding region, the description may be followed by a completeness qualifier, such as cds (complete coding sequence). The length is limited to three lines and the last line must end with a period. 3.5.6 ACCESSION Format This field contains a series of six-character identifiers (accession numbers: first character a letter, the remainder digits). The primary (first) accession number occupies positions 13 to 18; subsequent accession numbers occupy positions 20 to 25, 27 to 32, 34 to 39, 41 to 46, 48 to 53, 55 to 60, 62 to 67, and 69 to 74. No punctuation occurs between accession numbers or after the final accession number; accession numbers are separated only by one space. 3.5.7 KEYWORDS Format The KEYWORDS field does not appear in unannotated entries, but is required in all annotated entries. Keywords are separated by semicolons; a keyword may be a single word or a phrase consisting of several words. Each line in the keywords field ends in a semicolon; the last line ends with a period. If no keywords are included in the entry, the KEYWORDS record contains only a period. 3.5.8 SEGMENT Format The SEGMENT keyword is used when two (or more) entries of known relative orientation are separated by a short (<10 kb) stretch of DNA. It is limited to one line of the form 'n of m', where 'n' is the segment number of the current entry and 'm' is the total number of segments. 3.5.9 SOURCE Format The SOURCE field consists of two parts. The first part is found after the SOURCE keyword and contains free-format information including an abbreviated form of the organism name followed by a molecule type; multiple lines are allowed, but the last line must end with a period. The second part consists of information found after the ORGANISM subkeyword. The formal scientific name for the source organism (genus and species, where appropriate) is found on the same line as ORGANISM. The records following the ORGANISM line list the taxonomic classification levels, separated by semicolons and ending with a period. 3.5.10 REFERENCE Format The REFERENCE field consists of five parts: the keyword REFERENCE, and the subkeywords AUTHORS, TITLE (optional), JOURNAL and STANDARD. The REFERENCE line contains the number of the particular reference and (in parentheses) the range of bases in the sequence entry reported in this citation. Additional prose notes may also be found within the parentheses. The numbering of the references does not reflect publication dates or priorities. The AUTHORS line lists the authors in the order in which they appear in the cited article. Last names are separated from initials by a comma (no space); there is no comma before the final 'and'. The list of authors ends with a period. The TITLE line is an optional field, although it appears in the majority of entries. It does not appear in unpublished sequence data entries that have been deposited directly into the GenBank data bank, the EMBL Nucleotide Sequence Data Library, or the DNA Data Bank of Japan. The TITLE field does not end with a period. The JOURNAL line gives the appropriate literature citation for the sequence in the entry. The word 'Unpublished' will appear after the JOURNAL subkeyword if the data did not appear in the scientific literature, but was directly deposited into the data bank. For published sequences the JOURNAL line gives the Thesis, Journal, or Book citation, including the year of publication, the specific citation, or In press. The STANDARD line contains information about: The degree to which the entry has been annotated: 'unannotated' for unannotated entries which include citation and sequence only 'simple' for unannotated entries which include the organism name and protein coding regions as well as the citation and sequence 'full' for fully annotated entries which include all the data items that were described by the author The level of modification and review: 'automatic' for data subjected only to automated (i.e., software) checks 'staff_entry' for data that passed both automated and annotator checks 'staff_review' for data that passed previous review levels as well as a review by senior annotators and/or outside experts The format for the STANDARD line is: annotation degree review level 3.5.11 FEATURES Format With this release, GenBank introduces the new feature table format. This format has been designed jointly by GenBank, the EMBL Nucleotide Sequence Data Library, and the DNA Data Bank of Japan, and will be common to all three data banks. The feature table contains information about genes and gene products, as well as regions of biological significance reported in the sequence. The feature table contains information on regions of the sequence that code for proteins and RNA molecules. It also enumerates differences between different reports of the same sequence, and provides cross-references to other data collections, as described in more detail below. The first line of the feature table is a header that includes the keyword 'FEATURES' and the column header 'Location/ Qualifier.' Each feature consists of a descriptor line containing a feature key and a location (see sections below for details). If the location does not fit on this line, a continuation line may follow. If further information about the sequence is required, one or more lines containing feature qualifiers may follow the descriptor line. The feature key begins in column 6 and may be no more than 15 characters in length. The location begins in column 22. Feature qualifiers begin on subsequent lines at column 22. Location, qualifier, and continuation lines may extend from column 22 to 80. Feature tables are optional. However, a feature table must include one header line and at least one feature descriptor line. The sections below provide a brief introduction to the new feature table format. For a thorough description of the new feature table format, see the document 'The DDBJ/EMBL/GenBank Feature Table: Definition.' If you would like a copy of this publication, contact GenBank at the address shown on the front page of these release notes. 3.5.11.1 Feature Key Names The first column of the feature descriptor line contains the feature key. It starts at column 6 and can continue to column 20. The list of valid feature keys is shown below. allele Related strain contains alternative gene form attenuator Sequence related to transcription termination CAAT_signal 'CAAT box' in eukaryotic promoters CDS Sequence coding for amino acids in protein (includes stop codon) cellular Region of cellular DNA conflict Independent determinations differ D-loop Displacement loop enhancer Cis-acting enhancer of promoter function exon Region that codes for part of spliced mRNA GC_signal 'GC box' in eukaryotic promoters iDNA Intervening DNA eliminated by recombination insertion_seq Insertion sequence (IS), a small transposon intron Transcribed region excised by mRNA splicing LTR Long terminal repeat mat_peptide Mature peptide coding region (does not include stop codon) misc_binding Miscellaneous binding site misc_difference Miscellaneous difference feature misc_feature Region of biological significance that cannot be described by any other feature misc_recomb Miscellaneous recombination feature misc_RNA Miscellaneous transcript feature not defined by other RNA keys misc_signal Miscellaneous signal misc_structure Miscellaneous DNA or RNA structure modified_base The indicated base is a modified nucleotide mRNA Messenger RNA mutation A mutation alters the sequence here old_sequence Presented sequence revises a previous version polyA_signal Signal for cleavage & polyadenylation polyA_site Site at which polyadenine is added to mRNA precursor_RNA Any RNA species that is not yet the mature RNA product prim_transcript Primary (unprocessed) transcript primer_bind Non-covalent primer binding site promoter A region involved in transcription initiation protein_bind Non-covalent protein binding site on DNA or RNA provirus Proviral sequence RBS Ribosome binding site rep_origin Replication origin for duplex DNA repeat_region Sequence containing repeated subsequences repeat_unit One repeated unit of a repeat_region rRNA Ribosomal RNA satellite Satellite repeated sequence scRNA Small cytoplasmic RNA sig_peptide Signal peptide coding region snRNA Small nuclear RNA stem_loop Hair-pin loop structure in DNA or RNA TATA_signal 'TATA box' in eukaryotic promoters terminator Sequence causing transcription termination transit_peptide Transit peptide coding region transposon Transposable element (TN) tRNA Transfer RNA unsure Authors are unsure about the sequence in this region variation A related population contains stable mutation virion Virion (encapsidated) viral sequence - (hyphen) Placeholder -10_signal 'Pribnow box' in prokaryotic promoters -35_signal '-35 box' in prokaryotic promoters 3'clip 3'-most region of a precursor transcript removed in processing 3'UTR 3' untranslated region (trailer) 5'clip 5'-most region of a precursor transcript removed in processing 5'UTR 5' untranslated region (leader) 3.5.11.2 Feature Location The second column of the feature descriptor line designates the location of the feature in the sequence. The location descriptor begins at position 22. Several conventions are used to indicate sequence location. Base numbers in location descriptors refer to numbering in the entry, which is not necessarily the same as the numbering scheme used in the published report. The first base in the presented sequence is numbered base 1. Sequences are presented in the 5' to 3' direction. Location descriptors can be one of the following: 1. A single base; 2. A contiguous span of bases; 3. A site between two bases; 4. A single base chosen from a range of bases; 5. A single base chosen from among two or more specified bases; 6. A joining of sequence spans; 7. A reference to an entry other than the one to which the feature belongs (i.e., a remote entry), followed by a location descriptor referring to the remote sequence; 8. A literal sequence (a string of bases enclosed in quotation marks). A site between two residues, such as an endonuclease cleavage site, is indicated by listing the two bases separated by a carat (e.g., 23^24). A single residue chosen from a range of residues is indicated by the number of the first and last bases in the range separated by a single period (e.g., 23.79). The symbols < and > indicate that the end point of the range is beyond the specified base number. A contiguous span of bases is indicated by the number of the first and last bases in the range separated by two periods (e.g., 23..79). The symbols < and > indicate that the end point of the range is beyond the specified base number. Starting and ending positions can be indicated by base number or by one of the operators described below. Operators are prefixes that specify what must be done to the indicated sequence to locate the feature. The following are the operators available, along with their most common format and a description. complement (location): The feature is complementary to the location indicated. Complementary strands are read 5' to 3'. join (location, location, .. location): The indicated elements should be placed end to end to form one contiguous sequence. order (location, location, .. location): The elements are found in the specified order in the 5' to 3' direction, but nothing is implied about the rationality of joining them. group (location, location, .. location): The elements are related and should be grouped together, but no order is implied. one-of (location, location, .. location): The element can be any one, but only one, of the items listed. replace (location, location): The first location indicated should be replaced by the sequence from the second location; used for insertions, deletions, and variants. 3.5.11.3 Feature Qualifiers Qualifiers provide additional information about features. They take the form of a slash (/) followed by a qualifier name and, if applicable, an equal sign (=) and a qualifier value. Feature qualifiers begin at column 22. Qualifiers convey many types of information. Their values can, therefore, take several forms: 1. Free text; 2. Controlled vocabulary or enumerated values; 3. Citations or reference numbers; 4. Sequences; 5. Feature labels. Text qualifier values must be enclosed in double quotation marks. The text can consist of any printable characters (ASCII values 32-126 decimal). If the text string includes double quotation marks, each set must be 'escaped' by placing a double quotation mark in front of it (e.g., /note="This is an example of ""escaped"" quotation marks"). Some qualifiers require values selected from a limited set of choices. For example, the '/direction' qualifier has only three values 'left,' 'right,' or 'both.' These are called controlled vocabulary qualifier values. Controlled qualifier values are not case sensitive; they can be entered in any combination of upper- and lowercase without changing their meaning. Citation or published reference numbers for the entry should be enclosed in square brackets ([]) to distinguish them from other numbers. Multiple citations are separated by commas (e.g., [1],[2],[3]). A literal sequence of bases (e.g., "atgcatt") should be enclosed in quotation marks. Literal sequences are distinguished from free text by context. Qualifiers that take free text as their values do not take literal sequences, and vice versa. The '/label=' qualifier takes a feature label as its qualifier. Although feature labels are optional, they allow unambiguous references to the feature. The feature label identifies a feature within an entry; when combined with the accession number and the name of the data bank from which it came, it is a unique tag for that feature. Feature labels must be unique within an entry, but can be the same as a feature label in another entry. Feature labels are not case sensitive; they can be entered in any combination of upper-and lowercase without changing their meaning. The following is a list of valid feature qualifiers. /anticodon Location of the anticodon of tRNA and the amino acid for which it codes /bound_moiety Moiety bound /citation Reference to a citation providing the claim of or evidence for a feature /codon Specifies a codon that is different from any found in the reference genetic code /codon_start Indicates the reading frame of a protein coding region /cons_splice Identifies intron splice sites that do not conform to the 5'-GT ... AG-3' splice site consensus /direction Direction of DNA replication /EC_number Enzyme Commission number for the enzyme product of the sequence /evidence Value indicating the nature of supporting evidence /frequency Frequency of the occurrence of a feature /function Function attributed to a sequence /gene Symbol of the gene corrresponding to a sequence region /label A label used to permanently identify a feature /mod_base Abbreviation for a modified nucleotide base /note Any comment or additional information /number A number indicating the order of genetic elements (e.g., exons or introns) in the 5' to 3' direction /organism Name of organism if different from that contained in the entry's ORGANISM field /partial Differentiates between complete regions and partial ones /phenotype Phenotype conferred by the feature /product Name of a product encoded by the sequence /pseudo Indicates that this feature is a non-functional version of the element named by the feature key /rpt_family Type of repeated sequence; 'Alu' or 'Kpn,' for example /rpt_type Organization of repeated sequence /rpt unit Identity of repeat unit that constitutes a repeat_region /standard_name Accepted standard name for this feature /transl_except Translational exception: single codon, the translation of which does not conform to the reference genetic code /type Name of a strain if different from that in the SOURCE field /usedin Indicates that feature is used in a compound feature in another entry 3.5.11.4 Cross-Reference Information One type of information in the feature table lists cross-references to the annual compilation of transfer RNA sequences in Nucleic Acids Research, which has kindly been sent to us on tape by Dr. Sprinzl. Each tRNA entry of the feature table contains a /note= qualifier that includes a reference such as '(NAR: 1234)' to identify code 1234 in the NAR compilation. When such a cross-reference appears in an entry that contains a gene coding for a transfer RNA molecule, it refers to the code in the tRNA gene compilation. Similar cross-references in entries containing mature transfer RNA sequences refer to the companion compilation of tRNA sequences published by D.H. Gauss and M. Sprinzl in Nucleic Acids Research. See section 3.5.11.6 for an example. The feature tables of human entries contain cross-references to the HHMI Human Gene Mapping Library (HGML) data bank in New Haven, Connecticut as well as other collections of gene symbols. HGML includes information on mapped genes, probes, and restriction fragment length polymorphisms. Each entry in that data bank contains the official Human Gene Map Workshop symbol for the gene or locus. HGML assigns each gene a unique identifier that remains associated with that gene, regardless of changes in gene names. In entries that contain sequences for mapped genes a /note= qualifier includes this identifier placed within single quotes following the term '/hgml_locus_uid='. The /note= qualifier also includes the map location in single quotes following the term '/map='. The gene symbol formerly designated '/nomgen=' is contained in the /gene qualifier. See section 3.5.11.6 for an example. For more information about the HHMI Human Gene Mapping Library, you can send electronic mail to GENELIB@YALEVM.BITNET or GENESIC@YALEVM.BITNET or contact: Iva H. Cohen Human Gene Mapping Library 25 Science Park New Haven, CT 06511 Telephone: (203) 786-5515 3.5.11.5 Feature Table Examples In the first example a number of key names, feature locations, and qualifiers are illustrated, taken from different sequences. The first table entry is a coding region consisting of a simple span of bases and including a /gene qualifier. In the second table entry, an NAR cross-reference is given. The third and fourth table entries use the symbols '<' and '>' to indicate that the beginning or end of the feature is beyond the range of the presented sequence. In the fifth table entry, the symbol '^' indicates that the feature is between bases. In the sixth table entry, the replace operator is shown. 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- CDS 5..1261 /note="alpha-1-antitrypsin precursor /map='14q32.1' /hgml_locus_uid='LX0081X'" /gene="PI" tRNA 1..87 /note="Leu-tRNA-CAA (NAR: 1057)" /anticodon=(pos:35..37,aa:Leu) mRNA 1..>66 /note="alpha-1-acid glycoprotein mRNA" transposon <1..267 /note="insertion element IS5" misc_recomb 105^106 /note="B.subtilis DNA end/IS5 DNA start" conflict replace(258..258,"t") /citation=[2] ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 10. Feature Table Entries The next example shows the representation for a CDS that spans more than one entry. 1 10 20 30 40 50 60 70 79 ---------+---------+---------+---------+---------+---------+---------+--------- LOCUS HUMAPOB1 840 bp ds-DNA PRI 15-JUN-1989 DEFINITION Human apolipoprotein B-100 gene, exons 1 and 2. ACCESSION M15053 KEYWORDS apolipoprotein B-100. SEGMENT 1 of 2 . . . FEATURES Location/Qualifiers sig_peptide 283..354 /note="apolipoprotein B-100 signal peptide" precursor_RNA 155..>840 /note="apoB100 mRNA" intron 356..669 /note="apoB100 intron A" intron 709..>840 /note="apoB100 intron B" . . . // LOCUS HUMAPOB2 13872 bp ss-mRNA PRI 15-JUN-1989 DEFINITION Human apolipoprotein B-100 mRNA, starting at exon 3. ACCESSION M15051 M15054 KEYWORDS apolipoprotein B-100. SEGMENT 2 of 2 . . . FEATURES Location/Qualifiers precursor_RNA <1..13872 /note="apoB100 mRNA" variation 3204 /note="g in lambda-B25; c in lambda B1" CDS join(M15053:283..355,M15053:670..708, 1..13571) /note="apolipoprotein B-100 precursor" mat_peptide join(M15053:355..355,M15053:670..708, 1..13568) /note="apolipoprotein B-100" . . . // ---------+---------+---------+---------+---------+---------+---------+--------- 1 10 20 30 40 50 60 70 79 Example 11. Joining Sequences 3.5.12 ORIGIN Format The ORIGIN record may be left blank, may appear as 'Unreported.' or may give a local pointer to the sequence start, usually involving an experimentally determined restriction cleavage site or the genetic locus (if available). The ORIGIN record ends in a period if it contains data, but does not include the period if the record is left empty (in contrast to the KEYWORDS field which contains a period rather than being left blank). 3.5.13 SEQUENCE Format The nucleotide sequence for an entry is found in the records following the ORIGIN record. The sequence is reported in the 3' to 5' direction. There are sixty bases per record, listed in groups of ten bases followed by a blank, starting at position 11 of each record. The number of the first nucleotide in the record is given in columns 4 to 9 (right justified) of the record. 4 FUTURE RELEASES 4.1 Changes Planned for Release 65.0 No changes are planned for Release 65.0. 5 GENBANK ADMINISTRATION IntelliGenetics Inc., a developer and distributor of molecular biology computer programs, is the primary contractor for the GenBank data bank. IntelliGenetics maintains the computerized data center and oversees data distribution on all media. Under an arrangement with IntelliGenetics, Los Alamos National Laboratory (LANL) gathers, annotates, and organizes sequence data and transmits it to IntelliGenetics. LANL is operated by the University of California for the Department of Energy. The electronic mail address of LANL is GENBANK@LANL.GOV; their telephone number is (505) 665-2177. The IntelliGenetics address is on the front page of these release notes. 5.1 Registered Trademark Notice GenBank (R) is a registered trademark of the U.S. Department of Health and Human Services for the Genetic Sequence Data Bank operated by IntelliGenetics and Los Alamos National Laboratory under contract with the National Institutes of Health. 5.2 GenBank Sponsorship GenBank is sponsored by the National Institute of General Medical Sciences, NIH; The National Library of Medicine, NIH; and the U.S. Department of Energy. 5.3 Citing GenBank If you have used GenBank in your research, we would appreciate it if you would include a reference to GenBank in all publications related to that research. You may also wish to note that the GenBank data bank is publicly available from IntelliGenetics. When citing data in GenBank, it is appropriate to give the sequence name, primary accession number, and the publication in which the sequence first appeared. If the data are unpublished, we urge you to contact the group which submitted the data to GenBank to see if there is a recent publication or if they have determined any revisions or extensions of the data. It is also appropriate to list a reference for GenBank itself. The following publication, which describes the GenBank data bank, should be cited: Bilofsky, H.S. and Burks, C. The GenBank (R) Genetic Sequence Data Bank. Nucl. Acids Res. 16: 1861-1864 (1988) The following statement is an example of how you may cite GenBank data. It cites the sequence, its primary accession number, the group who determined the sequence, and GenBank. The numbers in brackets refer to one of the GenBank citations above and the REFERENCE in the GenBank sequence entry. 'We scanned the GenBank (1) data bank for sequence similarities and found one sequence (2), GenBank accession number J01016, which showed significant similarity. . .' (1) Bilofsky, H.S. and Burks, C. Nucl. Acids Res. 16: 1861-1864 (1988) (2) Nellen, W. and Gallwitz, D. J. Mol. Biol. 159, 1-18 (1982) 5.4 GenBank Distribution Formats and Media The GenBank data bank is available in three formats on three physical media. The three formats are fixed-length 80- character records, VAX/VMS Backup saveset, and compressed Unix tar archive format. The three media are industry- standard 9-track magnetic tapes, Sun 1/4" QIC 24 format cartridges, and TK-50 cartridges. The following chart specifies which formats are available in each medium. To request a change in the format, media, or density of the tapes you receive, write to the address (or call the telephone number) on the first page of these release notes. FILE FORMATS TAPE MEDIA Unlabelled ASCII VAX/VMS Unix (fixed-length records) Backup Saveset tar tarfile 9-track, 2400' reel 1600 bpi MU M M 6250 bpi MU M M TK-50 cartridge (DEC) NA M NA 1/4" QIC 24 cartridge (Sun) NA NA M MU tapes are available in both mixed-case and uppercase-only formats M tapes are available only in mixed-case characters NA not available Table 1. Tape Media and Formats 5.5 Request for Direct Submission of Sequence Data The growth of nucleotide sequence data is close to exponential. Both the proposed Human Genome sequencing project and the increasing automation of sequencing make it clear that GenBank is going to continue to grow rapidly. The data bank may contain anywhere from seven-fold to sixty-fold more nucleotides in 1995 than it did in 1985. A successful GenBank requires that the data enter the data bank as soon as possible after publication, that the annotations be as complete as possible, and that the sequence and annotation data be accurate. All three of these requirements are best met if authors of sequence data submit their data directly to GenBank in a usable form. It is especially important that these submissions be in computer-readable form. GenBank must rely on direct author submission of data to ensure that it achieves its goals of complete, accurate, and timely data. For many years, GenBank has had a printed data submission form. This form is now standardized among EMBL, DDBJ, GenBank, PIR, MIPS, and JIPID. GenBank also provides a corresponding computer-readable data submission form that can be used for electronic mail and floppy disk submissions. Please use the GenBank Data Submission Form (located in the file GBDAT.FRM) to submit your sequence and annotations. Electronic mail submissions should go to the address "GB-SUB%LIFE@LANL.GOV"; direct mail should go to our postal address in Los Alamos, which is on the data submission form. To assist researchers in entering their own sequence data, GenBank has developed AUTHORIN, an easy-to-use program that enables authors to enter a sequence, annotate it, and submit it to GenBank or any of the other data banks. The IBM PC compatible version of AUTHORIN may be obtained by completing the enclosed AUTHORIN request card or by contacting GenBank at the address shown on the front of these release notes. Versions for the Macintosh, VAX, and Sun workstations are also planned and will be announced in future release notes as they become available. 5.6 Request for Corrections and Comments We welcome your suggestions for improvements to GenBank. We are especially interested to learn of errors or inconsistencies in the data. Please use the GenBank Error/Suggestion Report Form, which is part of this distribution of GenBank (located in the file GBDAT.FRM), to send your suggestions and corrections to the address on the first page of these release notes. Please be certain to indicate the GenBank release number (e.g., Release 64.0) and the primary accession number of the entry to which your comments apply; it is helpful if you also give the entry name and the current contents of any data field for which you are recommending a change. 5.7 Disclaimer IntelliGenetics Inc., Los Alamos National Laboratory, and the United States Government make no representations or warranties regarding the content or accuracy of the information. IntelliGenetics Inc., Los Alamos National Laboratory, and the United States Government also make no representations or warranties of merchantibility or fitness for a particular purpose and accept no responsibility for any consequences of the receipt or use of the information. AFPPENDIX A. Statistical Summary Division Entries Bases Reports PRIMATE 6401 7718869 8030 RODENT 6346 6391473 7595 OTHER MAMMALIAN 1300 1588171 1502 OTHER VERTEBRATE 1573 1797660 1881 INVERTEBRATE 2648 3313426 3146 PLANT 2443 3697029 2953 ORGANELLE 1110 1594294 1368 BACTERIAL 3508 5669333 4502 STRUCTURAL RNA 1374 357521 1646 VIRAL 3216 5363016 4175 PHAGE 509 608720 781 SYNTHETIC 927 367496 1011 UNANNOTATED 3745 4028885 4438 Total (13 divisions) 35100 42495893 43028 Sequences with greater than 30,000 bp Locus Div Accession Length ADBCG VRL J01917 35937bp CHKMYHE VRT J02714 31111bp HS11UL VRL D00317 108360bp HS4 VRL V01555 172282bp HS5HCMVU VRL X04650 43275bp HUMADAG PRI M13792 36741bp HUMFIXG PRI K02402 38059bp HUMGHCSA PRI J03071 66495bp HUMHBB PRI J00179 73326bp HUMHPRTB PRI M26434 56736bp HUMTPA PRI K03021 36594bp LAMCG PHG J02459 48502bp MPOCPCG ORG X04465 121024bp PT7CG PHG J02518 39936bp RABBGLOB MAM M18818 44594bp RATCRYG ROD M19359 54670bp TOBCPCG ORG Z00044 155844bp VAZCG VRL X04370 124884bp APPENDIX B. Entries with a change in locus name Accession Rel 63.0 Rel 64.0 --------- ---------- --------- D00441 PVY PVYAAA J00914 CHKU1RNA CHKSRU1R J01569 CE1PROR ECOPROR J01760 PS1 PS1AAA J01820 TIPCTS ATUCTS J02017 ACSPLTR AREPLTR J02195 PCG PCGAAA J02336 MLR RMLAAA J02447 D18A D18AAA J03912 HUMRBPI HUMIRBPM J04425 CHKCOLA2 CHKCOLAI2 J05011 HUMHIVEP1 HIVEP1 K01220 SV4RATI3 RATSV4I3 K01264 AEBHEX ADZHEX K03325 AESLS ADGLS M11373 SIVENV STLENV M12466 ADBHAMJN HAMADBJN M14082 AESVARNA ADGVARNA M19044 RATATPSNA RATMTATPSA M21958 TBSCPP TBSCG M22245 AEM1AB ADX1AB M22382 HUMP1P HUMPMMPP1 M24466 STYFLIH STYFLGH M24916 PSENTRA PSENTRAA M26697 HUMB23A HUMNUMB23 M27280 HEILICL HEILIC1 M29040 BSUSPOIVCA BSUCISAB M30594 AEM1EARL ADX1EARL M31976 MUST66CG MUSTCPC V00035 AESRS ADGRS X01027 AESSA7PE1 ADGSA7PE1 X01162 CE3IMM CECCOLE3IM X05739 SUSACT1 SUSACTA01 X07755 TBEEV1 TBECGA Y00269 HIVHTLV4A SIVHTLV4A APPENENDIX C. Number of entries, reports, and bases by organism PRIMATE Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. AGM Cercopithecus aethiops 37 36 24356 2. ATR Aotus trivirgatus 7 7 7557 3. BAB Papio anubis 3 3 2653 4. BAB Papio doguera 1 1 2000 5. BAB Papio hamadryas 4 4 8576 6. BAB Papio papio 1 1 343 7. BAB Papio sp. 3 3 1601 8. CEB Cebus sp. 2 2 190 9. CEP Cebus apella 7 7 1819 10. CHP Pan paniscus 1 1 1683 11. CHP Pan troglodytes 66 52 71218 12. COL Colobus polykomos 2 2 1494 13. GCR Galago crassicaudatus 35 35 11381 14. GIB Hylobates lar 7 5 15455 15. GOR Gorilla gorilla 16 10 17953 16. GSE Galago senegalensis 1 1 369 17. HUM Homo sapiens 7748 6157 7436306 18. LEM Cheirogaleus medius 1 1 1899 19. LEM Lemur albifrons 1 1 1786 20. LEM Lemur macaco 3 3 5590 21. LEM Lemur sp. 1 1 1380 22. MAC Macaca fascicularis 8 7 6355 23. MAC Macaca mulatta 31 20 31531 24. MAC Macaca nemestrina 1 1 1115 25. MAC Macaca radiata 2 2 342 26. MAC Macaca sp. 7 6 7502 27. MNK Ateles geoffroyi 3 3 12261 28. MNK Monkey 9 9 1423 29. ORA Pongo pygmaeus 18 17 36667 30. SOE Sanguinus oedipus 1 1 660 31. TAR Tarsius sp. 3 2 5404 Total 8030 6401 7718869 RODENT Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. DIP Dipodomys ordii 1 1 3318 2. GPI Cavia cobaya 1 1 491 3. GPI Cavia cutleri 3 2 4959 4. GPI Cavia porcellus 16 15 19158 5. HAM Cricetulus griseus 16 14 17912 6. HAM Cricetulus longicaudatus 31 25 16035 7. HAM Cricetulus sp. 22 18 20335 8. HAM Cricetus cricetus 6 6 11446 9. HAM Mesocricetus auratus 78 55 71411 10. HAM Mesocricetus sp. 88 56 40625 11. MAR Marmota monax 5 3 7012 12. MUS Mus caroli 11 10 6026 13. MUS Mus domesticus 22 19 12678 14. MUS Mus muscaris 56 56 26376 15. MUS Mus musculus 4774 4002 3545428 16. MUS Mus pahari 6 6 6154 17. MUS Mus platythrix 1 1 315 18. MUS Mus sp. 3 3 4982 19. MUS Mus spretus 8 6 6660 20. PER Peromyscus leucopus 3 3 3640 21. PER Peromyscus maniculatus 11 11 1791 22. RAT Rattus colletti 4 4 7849 23. RAT Rattus fuscipes 1 1 1161 24. RAT Rattus leucopus 3 3 3481 25. RAT Rattus norvegicus 2222 1849 2280317 26. RAT Rattus rattus 141 125 200303 27. RAT Rattus sordidus 1 1 1161 28. RAT Rattus sp. 51 42 55507 29. RAT Rattus tunneyi 1 1 1161 30. RAT Rattus villosissimus 2 2 3369 31. SEH Spalax ehrenbergi 7 5 10412 Total 7595 6346 6391473 OTHER MAMMALIAN Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. AXI Axis axis 2 2 1758 2. BOV Bos taurus 636 539 617877 3. CAT Felis catus 40 40 23293 4. CAT Felis domesticus 3 3 4748 5. CAT Felis silvestris 8 6 9516 6. CAT Felis sp. 1 1 3534 7. DAV Dasyurus viverrinus 2 2 939 8. DOG Canis familiaris 26 17 29528 9. DOG Canis lupus 6 6 6169 10. DOG Canis sp. 12 11 14350 11. GOT Capra hircus 43 40 34381 12. HRS Equus caballus 20 15 26834 13. LEE Lepus capensis 1 1 434 14. LEE Lepus europaeus 1 1 3646 15. MMU Muntiacus muntjak 1 1 807 16. MVI Mustela vison 1 1 585 17. OPO Didelphis virginiana 9 9 7139 18. PIG Sus scrofa 158 131 200269 19. RAB Basilea sp. 1 1 377 20. RAB Oryctolagus cuniculus 386 336 440557 21. RAB Oryctolagus sp. 45 45 60173 22. RAB Sylvilagus floridanus 1 1 1065 23. SEA Halichoerus grypus 3 3 2288 24. SHP Ovis aries 48 45 60593 25. SHP Ovis sp. 35 33 27086 26. SUN Suncus murinus 8 6 6231 27. VMP Desmodus rotundus 2 1 1725 28. WAL Macropus eugenii 1 1 754 29. WAL Macropus robustus 1 1 1465 30. WAL Macropus rufus 1 1 50 Total 1502 1300 1588171 OTHER VERTEBRATE Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. APT Ascaphus truei 2 1 1897 2. BUJ Bufo japonicus 2 2 1116 3. CHK Gallus domesticus 129 95 139366 4. CHK Gallus gallus 896 735 902543 5. CPL Carcharhinus plumbeus 4 4 1821 6. CRC Caiman crocodylus 4 4 2420 7. DUK Aix sp. 1 1 165 8. DUK Anas platyrhynchos 13 12 18150 9. DUK Cairina moschata 27 19 18528 10. FAL Falco columbarius 1 1 174 11. FSA Myxine glutinosa 1 1 915 12. FSA Petromyzon marinus 5 5 9661 13. FSB Anarhichas lupus 1 1 3395 14. FSB Carassius auratus 7 6 6765 15. FSB Catostomus commersoni 3 3 1936 16. FSB Ctenopharyngobon idella 1 1 4243 17. FSB Cyprinus carpio 15 12 12379 18. FSB Electrophorus electricus 3 3 8176 19. FSB Elops saurus 9 5 3870 20. FSB Ictalurus punctatus 7 6 5679 21. FSB Limanda ferruginea 1 1 416 22. FSB Lophius americanus 8 5 2942 23. FSB Macrozoarces americanus 3 3 2657 24. FSB Misgurnus fossilis 2 2 1697 25. FSB Oncorhynchus keta 27 23 19621 26. FSB Oncorhynchus kisutch 2 2 3221 27. FSB Oncorhynchus tschawytscha 3 3 1862 28. FSB Paralichthys olivaceus 5 3 2223 29. FSB Pseudopleuronectus americanus 7 5 2680 30. FSB Salmo gairdneri 47 43 36617 31. FSB Salmo irideus 1 1 1278 32. FSB Salmo salar 4 4 5039 33. FSB Thunnus thynnus 1 1 911 34. FSC Torpedo californica 20 12 21607 35. FSC Torpedo marmorata 3 3 4033 36. GOO Anser anser 2 2 4906 37. GRE Geoclemys reevessi 1 1 239 38. HEF Heterodontus francisci 32 28 11813 39. LSE Laticauda semifasciata 1 1 483 40. LSL Laticauda laticaudata 2 1 632 41. NEW Cynops pyrrhogaster 1 1 629 42. NVI Notophthalmus viridescens 10 10 1458 43. ORN Oreochromis niloticus 1 1 847 44. PAG Pagrus major 2 1 906 45. PGN Columba sp. 2 2 1665 46. PHS Phasianus colchicus 1 1 739 47. PHU Phyllomedusa sauvagei 2 2 1315 48. PLS Phylloscopus trochilus 6 3 2593 49. PYU Pyura stolonifera 7 7 1029 50. QUL Coturnix coturnix 21 19 14969 51. QUL Coturnix japonica 1 1 311 52. RAN Rana catesbeiana 8 7 5775 53. RAN Rana pipiens 4 3 1625 54. RAN Rana temporaria 18 13 7156 55. SCC Scyliorhinus caniculus 2 2 667 56. SKT Raja erinacea 12 8 10209 57. SMD Triturus vulgaris 1 1 310 58. SMR Pleurodeles waltlii 5 4 2305 59. SNK Aipysurus laevis 6 3 1332 60. SNK Bothrops atrox 8 7 11423 61. SNK Crotalus durissus 3 2 1263 62. SNK Elaphe radiata 1 1 2483 63. SNK Naja naja 1 1 312 64. SNK Natrix tessellata 1 1 312 65. SNK Notechis scutatus 2 1 621 66. SRA Hemitripterus americanus 2 2 3294 67. TKY Meleagris gallopavo 14 12 5288 68. XEB Xenopus borealis 21 20 15140 69. XEC Xenopus clivii 6 6 1406 70. XEL Xenopus laevis 390 357 413699 71. XET Xenopus tropicalis 9 6 13706 72. XIP Xiphophorus maculatus 2 1 533 73. ZEF Brachydanio rerio 8 6 4264 Total 1881 1573 1797660 INVERTEBRATE Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. ACA Acanthamoeba castellanii 10 6 22430 2. ACP Acropora formosa 2 2 236 3. ACP Acropora latistella 3 3 354 4. ADO Acheta domesticus 1 1 541 5. AEI Aequipecten irradians 2 2 1253 6. AEV Aequorea victoria 4 4 2595 7. AME Apis melifica 2 2 897 8. AMF Apis mellifera 7 7 1818 9. APL Aplysia californica 21 18 16598 10. APL Aplysia sp. 9 9 9086 11. APO Antheraea polyphemus 19 19 8010 12. APY Antheraea yamamai 1 1 1200 13. ARB Arbacia punctulata 3 2 5078 14. BBO Babesia bovis 4 4 2482 15. BBO Babesia rodhaini 1 1 3238 16. BMO Bombyx mandarina 1 1 2454 17. BMO Bombyx mori 109 92 76785 18. BPL Brachionus plicatilis 1 1 120 19. BRP Brugia malayi 8 7 8668 20. BRP Brugia pahangi 1 1 322 21. BUG Bugula neritina 1 1 120 22. CAF Calanus finmarchicus 1 1 1487 23. CAR Carcinoscorpius rotundicauda 1 1 983 24. CAV Calliphora vicina 11 7 16088 25. CBL Trichoplusia ni 1 1 2475 26. CCA Caledia captiva 12 12 11713 27. CEL Caenorhabditis briggsae 3 2 3657 28. CEL Caenorhabditis elegans 109 82 194202 29. CER Calliphora erythrocephala 6 6 1677 30. CHI Chironomus pallidivittatus 21 16 5529 31. CHI Chironomus tentans 21 20 20059 32. CHI Chironomus thummi 29 28 29934 33. CLM Clam sp. 1 1 2163 34. CLM Spisula solidissima 1 1 806 35. CRB Cardisoma guanhumi 3 3 1992 36. CRB Gecarcinus lateralis 4 4 6714 37. CRB Geryon quinquedens 1 1 85 38. CRB Limulus polyphemus 4 4 4886 39. CRB Paguras pollicaris 4 4 419 40. CUL Culex pipiens 1 1 3105 41. DDI Dictyostelium discoideum 246 188 188931 42. DDI Dictyostelium sp. 2 1 4439 43. DEM Dermasterias imbricata 2 2 1016 44. DEP Dermatophagoides pteronyssinus 2 1 824 45. DIA Diadromus pulchellus 1 1 324 46. DIC Dicyema misakiense 1 1 116 47. DIR Dirofilaria immitis 1 1 246 48. DRE Drosophila erecta 4 4 4847 49. DRF Drosophila funebris 2 1 3002 50. DRG Drosophila gymnobasis 10 10 1793 51. DRH Drosophila hydei 12 11 17221 52. DRI Drosophila grimshawi 2 2 378 53. DRL Drosophila silvarentis 2 2 356 54. DRM Drosophila mauritiana 6 4 8528 55. DRN Drosophila nebulosa 2 2 2991 56. DRO Drosophila melanogaster 1072 864 1367722 57. DRO Drosophila subobscura 1 1 1593 58. DRP Drosophila pseudoobscura 10 5 21972 59. DRQ Drosophila sechellia 2 1 4572 60. DRR Drosophila orena 6 3 5972 61. DRS Drosophila simulans 17 14 15853 62. DRT Drosophila teissieri 1 1 346 63. DRU Drosophila mulleri 1 1 6778 64. DRV Drosophila virilis 36 30 35650 65. DRW Drosophila mojavensis 5 4 12829 66. DRY Drosophila yakuba 1 1 1853 67. ECC Echinococcus granulosus 2 2 1230 68. EIM Eimeria acervulina 2 1 807 69. EIM Eimeria tenella 4 3 3907 70. ENH Entamoeba histolytica 13 12 8825 71. EUC Eurypelma californicum 1 1 1579 72. EWA Euplotes aediculatus 1 1 1882 73. EWC Euplotes crassus 1 1 770 74. EWE Euplotes eurystomus 2 1 930 75. EWR Euplotes raikovi 1 1 593 76. FFL Luciola cruciata 1 1 1985 77. FHE Fasciola hepatica 1 1 894 78. GCH Glaucoma chattoni 2 2 488 79. GLA Giardia lamblia 10 7 3801 80. GLY Glycera dibranchiata 1 1 745 81. GMO Glossina austeni 1 1 653 82. GMO Glossina fuscipes 1 1 239 83. GMO Glossina morsitans 7 7 3244 84. GMO Glossina palpalis 1 1 236 85. HAE Haemonchus contortus 3 3 2422 86. HCE Hyalophora cecropia 7 7 4501 87. HEL Heliothis virescens 1 1 2977 88. HIR Hirudo medicinalis 1 1 379 89. HOL Holothuria polii 5 5 1964 90. HOL Holothuria tubulosa 1 1 441 91. HYD Hydra sp. 2 2 4555 92. LAN Lingula anatina 1 1 120 93. LEI Leishmania donovani 7 5 9997 94. LEI Leishmania enriettae 4 4 1562 95. LEI Leishmania enriettii 3 3 4153 96. LEI Leishmania major 9 7 7531 97. LEI Leishmania sp. 2 2 4472 98. LEI Leishmania tropica 1 1 1851 99. LIT Litomosoides carinii 2 2 214 100. LMI Locusta migratoria 4 4 2101 101. LUM Lumbricus terrestris 2 2 5061 102. LYM Lymnaea stagnalis 1 1 482 103. MDO Musca domestica 3 2 2916 104. MOT Manduca sexta 23 20 36451 105. MSQ Aedes aegypti 2 2 3491 106. NEM Ascaris lumbricoides 32 32 12741 107. NEM Ascaris suum 2 2 6079 108. NGR Naegleria gruberi 4 4 6389 109. OCT Octopus dofleini 1 1 1315 110. OCT Paroctopus defleini 1 1 1675 111. OFA Oxytricha fallax 34 13 9421 112. ONG Onchocerca sp. 2 2 214 113. ONG Onchocerca volvulus 16 15 12676 114. ONO Oxytricha nova 18 18 13838 115. OWE Owenia fusiformis 1 1 1548 116. PAA Parascaris sp. 2 2 215 117. PAL Paracentrotus lividus 4 3 7058 118. PAR Paramecium primaurelia 6 6 11862 119. PAR Paramecium tetraurelia 16 16 7831 120. PBA Plasmodium gallinaceum 1 1 799 121. PBE Plasmodium berghei 5 5 8498 122. PBS Plasmodium brasilianum 2 2 3010 123. PCH Plasmodium chabaudi 5 5 7006 124. PCR Philosamia cynthia ricini 1 1 120 125. PCY Plasmodium cynomolgi 6 6 7875 126. PFA Plasmodium falciparum 148 129 189006 127. PIO Pisaster ochraceus 5 5 9699 128. PKN Plasmodium knowlesi 7 5 5169 129. PLM Plasmodium malariae 1 1 1545 130. PLO Plasmodium lophurae 6 5 6087 131. PMC Pneumocystis carinii 8 5 5190 132. PMI Prorocentrum micans 2 1 2451 133. PPY Photinus pyralis 1 1 2387 134. PVI Plasmodium vivax 7 5 8010 135. PYO Plasmodium yoelii 10 8 20238 136. SCA Schistocerca americana 2 2 3416 137. SCA Schistocerca nitans 2 2 711 138. SCI Sciara coprophila 2 2 822 139. SCM Schistosoma japonicum 9 8 8326 140. SCM Schistosoma mansoni 51 45 39157 141. SCR Androctonus australis 7 7 2563 142. SEM Parastichopus parvimensis 1 1 1458 143. SHR Artemia salina 20 19 13823 144. SHR Artemia sp. 3 3 2045 145. SLE Stylonychia lemnae 4 4 7237 146. SLU Stylonychia pustulata 5 5 2820 147. SPE Sarcophaga peregrina 7 7 6405 148. SPF Spodoptera frugiperda 1 1 228 149. SUL Lytechinus pictus 10 10 9731 150. SUL Lytechinus variegatus 15 15 11452 151. SUP Psammechinus miliaris 44 32 25482 152. SUS Strongylocentrotus drobachiensis 1 1 229 153. SUS Strongylocentrotus franciscanus 2 2 3832 154. SUS Strongylocentrotus purpuratus 137 128 96532 155. SUT Tripneustes gratilla 11 6 7146 156. SUU Sea urchin 12 12 2632 157. TAE Taenia solium 3 2 3737 158. TAT Tachypleus tridentatus 1 1 946 159. TCA Tribolium castaneum 1 1 707 160. TCK Boophilus microplus 1 1 2225 161. TCS Trichostrongylus colubriformis 2 2 1987 162. TEC Tetrahymena cosmopolitanis 1 1 511 163. TEH Tetrahymena hyperangularis 2 2 767 164. TEM Tetrahymena malaccensis 1 1 507 165. TEN Tenebrio molitor 23 23 5207 166. TEP Tetrahymena pigmentosa 4 4 3072 167. TES Tetrahymena sonneborni 1 1 511 168. TET Tetrahymena thermophila 63 60 48324 169. TEY Tetrahymena pyriformis 9 7 4343 170. THE Theileria annulata 2 2 2859 171. THE Theileria parva 2 2 5207 172. TOX Toxoplasma gondii 8 8 10596 173. TRB Trypanosoma brucei 217 193 228884 174. TRC Trypanosoma cruzi 31 30 25702 175. TRE Trypanosoma equiperdum 6 4 3816 176. TRF Crithidia fasciculata 15 11 16299 177. TRL Leptomonas collosoma 1 1 154 178. TRL Leptomonas seymouri 6 4 2130 179. TRO Trypanosoma congolense 7 7 7982 180. TRV Trypanosoma vivax 2 2 857 181. TRWKP Kinetoplast Trypanosoma lewisi 2 2 2036 182. TRY Trypanosoma rangeli 1 1 153 183. TSR Trichinella spiralis 2 1 1613 184. UCA Urechis caupo 1 1 718 185. VAI Vairimorpha necatrix 1 1 1244 186. VUI Eupelmus vuilleti 1 1 106 187. WSP Dolichovespula maculata 2 2 1367 Total 3146 2648 3313426 PLANT Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. ABG Absidia glauca 2 1 1011 2. ACK Achlya ambisexualis 1 1 1121 3. ACK Achlya bisexualis 1 1 1809 4. ACK Achlya klebsiana 1 1 1254 5. ACT Actinidia chinensis 6 4 3861 6. AEG Aegilops tauschii 1 1 421 7. ALC Allium cepa 3 3 1135 8. ALF Medicago sativa 23 13 21111 9. AMA Antirrhinum majus 11 7 11860 10. APE Acremonium chrysogenum 1 1 1393 11. ASA Aspergillus awamori 3 2 6109 12. ASG Aspergillus niger 9 7 13043 13. ASL Aessosporon salmonicolor 1 1 119 14. ASN Aspergillus nidulans 40 33 71255 15. ASO Aspergillus oryzae 8 5 11066 16. AST Avena sativa 5 5 16994 17. ATH Arabidopsis thaliana 67 54 90653 18. AVO Persea americana 1 1 2021 19. BJE Bjerkandera adusta 1 1 118 20. BLY Hordeum vulgare 58 48 64465 21. BNA Brassica napus 20 16 13795 22. BOL Brassica campestris 6 3 3852 23. BOL Brassica oleracea 12 9 3859 24. BRM Bremia lactucae 1 1 2869 25. BRN Bertholletia excelsa 1 1 621 26. CAG Canavalia gladiata 3 2 4797 27. CCI Coprinus cinereus 2 2 6091 28. CEN Canavalia ensiformis 1 1 1027 29. CFU Caldariomyces fumago 2 1 2787 30. CHE Chenopodium rubrum 2 1 689 31. CHL Chlorella sp. 6 6 1019 32. CIP Mesembryanthemum crystallinum 4 3 5478 33. CLI Citrus limon 2 2 4868 34. CLR Clarkia unguiculata 2 1 2040 35. COA Convolvulus arvensis 4 4 4549 36. COC Cochliobolus heterostrophus 2 2 2634 37. COG Colletotrichum capsici 1 1 2557 38. COG Colletotrichum gloeosporioides 1 1 1749 39. COT Gossypium hirsutum 9 9 13403 40. CPA Carica papaya 3 3 2037 41. CRE Chlamydomonas reinhardtii 24 22 30254 42. CTR Catharanthus roseus 1 1 1740 43. CUC Cucurbita maxima 5 3 15360 44. CUC Cucurbita moschata 2 1 1781 45. CUC Cucurbita pepo 7 7 5679 46. CUS Cucumis sativus 11 11 11305 47. DAR Daucus carota 6 6 8976 48. DBI Dolichos biflorus 1 1 1005 49. DCG Dictyoglomus thermophilum 1 1 2649 50. DUN Dunaliella salina 3 2 2541 51. EGR Euglena gracilis 5 4 5700 52. EPA Endothia parasitica 5 3 3809 53. EPK Ephedra kokanica 2 1 120 54. ERG Erysiphe graminis 1 1 2474 55. FIL Filobasidium capsuligenum 1 1 118 56. FIL Filobasidium floriforme 1 1 118 57. FLX Linum usitatissimum 5 5 1726 58. FSO Fusarium oxysporum 1 1 132 59. FSO Fusarium solani 2 2 3652 60. FSO Fusarium sporotrichioides 2 1 1908 61. FTR Flaveria trinervia 1 1 752 62. GNG Gnetum gnemon 2 1 120 63. HNN Helianthus annuus 6 4 5752 64. IPB Ipomoea batatas 8 6 10824 65. LGI Lemna gibba 5 5 4646 66. LGI Lemna minor 1 1 119 67. LIL Lilium henryi 1 1 9345 68. LUP Lupinus luteus 20 15 5280 69. MIN Matthiola incana 1 1 509 70. MRA Mucor racemosus 8 7 6601 71. MRM Mucor miehei 2 2 3316 72. MRP Mucor pusillus 1 1 1965 73. MZE Zea mays 170 143 213171 74. NAN Nanochlorum eucaryotum 2 1 1796 75. NEU Neurospora crassa 97 88 115769 76. OCH Ochromonas danica 1 1 1789 77. PAN Podospora anserina 3 3 1901 78. PBL Phycomyces blakesleeanus 3 3 545 79. PCP Physcomitrella patens 1 1 2544 80. PEA Pisum sativum 66 60 71485 81. PEC Penicillium chrysogenum 6 5 8394 82. PET Petunia hybrida 30 28 22430 83. PET Petunia sp. 25 24 13154 84. PHA Phanerochaete chrysosporium 10 8 11455 85. PHN Pharbitis nil 10 5 534 86. PHO Petroselinum hortense 1 1 1431 87. PHV Phaseolus lunatus 1 1 926 88. PHV Phaseolus vulgaris 46 36 41910 89. PIN Pinus thunbergii 4 2 1889 90. POM Polystichum munitum 4 4 4645 91. POP Populus sp. 17 10 3776 92. POT Solanum tuberosum 58 49 85195 93. PTE Porphyra umbilicalis 2 1 121 94. PUM Petroselinum crispum 8 8 5448 95. PYL Pylaiella littoralis 2 1 1644 96. RAD Raphanus sativus 7 5 1952 97. RCC Ricinus communis 6 6 10485 98. RCH Rhizopus chinensis 1 1 1133 99. RCH Rhizopus niveus 2 2 3448 100. RCH Rhizopus oryzae 1 1 2290 101. RDT Rhodotorula rubra 2 1 3586 102. RHD Rhodosporidium toruloides 2 2 3181 103. RIC Oryza sativa 67 44 66291 104. RYE Secale cereale 2 2 1363 105. SAL Sinapis alba 5 4 2302 106. SCO Schizophyllum commune 5 5 4836 107. SES Sesbania rostrata 4 4 2075 108. SIP Silene pratensis 4 4 3165 109. SLM Physarum polycephalum 57 47 49961 110. SOY Glycine max 139 106 159802 111. SPI Spinacia oleracea 29 24 28509 112. SRG Sorghum bicolor 1 1 160 113. SRG Sorghum sp. 1 1 4638 114. SSI Scilla siberica 4 4 204 115. TDA Thaumatococcus daniellii 1 1 931 116. TLA Thermomyces lanuginosus 5 4 3671 117. TOB Nicotiana alata 1 1 804 118. TOB Nicotiana plumbaginifolia 7 7 11855 119. TOB Nicotiana rustica 2 2 593 120. TOB Nicotiana sylvestris 4 4 1382 121. TOB Nicotiana tabacum 55 42 49187 122. TOM Lycopersicon esculentum 58 50 58245 123. TOM Lycopersicon peruvianum 1 1 480 124. TRD Tripsacum dactyloides 2 2 812 125. TRR Trichoderma reesei 3 3 5569 126. TRT Trema tomentosa 2 1 1727 127. URO Uromyces appendiculatus 1 1 1449 128. USM Ustilago maydis 3 2 2656 129. VFA Vicia faba 37 31 28061 130. VIR Vigna radiata 3 3 3421 131. VVC Volvox carteri 7 5 7657 132. WHT Triticum aestivum 83 69 99737 133. WHT Triticum durum 1 1 898 134. WHT Triticum sp. 3 3 7370 135. WHT Triticum vulgare 1 1 965 136. YS1 Zygosaccharomyces fermentati 1 1 5416 137. YS2 Saccharomycopsis fibuligera 3 3 9339 138. YS4 Candida boidinii 2 2 1863 139. YS5 Candida glabrata 3 3 2758 140. YSA Candida albicans 8 6 11668 141. YSB Candida tropicalis 15 12 20761 142. YSC Saccharomyces cerevisiae 976 807 1420238 143. YSCTY Transposable element TY1 41 36 43719 144. YSD Saccharomyces diastaticus 4 4 4319 145. YSE Candida pelliculosa 1 1 5327 146. YSF Candida maltosa 5 4 8167 147. YSG Saccharomyces carlsbergensis 22 19 36227 148. YSH Hansenula wingei 3 3 720 149. YSI Saccharomyces fibuligera 2 2 6761 150. YSJ Yarrowia lipolytica 5 4 11065 151. YSK Kluyveromyces lactis 36 27 83929 152. YSM Hansenula polymorpha 3 3 8018 153. YSN Kluyveromyces fragilis 1 1 4193 154. YSO Zygosaccharomyces rouxii 5 3 15025 155. YSP Schizosaccharomyces pombe 110 92 154243 156. YSQ Pichia pastoris 3 3 899 157. YSS Cephalosporium acremonium 4 4 2093 158. YST Yeast sp. 33 32 15660 159. YSU Candida utilis 4 4 7578 160. YSV Saccharomyces uvarum 1 1 2001 161. YSW Kluyveromyces drosophilarum 1 1 4757 162. YSX Saccharomyces rosei 1 1 278 163. YSY Saccharomyces kluyveri 3 2 2160 164. YSZ Zygosaccharomyces bailii 1 1 5415 165. ZAM Zamia pumila 1 1 1813 Total 2953 2443 3697029 ORGANELLE Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. ABGMT Mitochondrion Absidia glauca 1 1 596 2. AEGCP Chloroplast Aegilops crassa 2 1 1436 3. AEGCP Chloroplast Aegilops squarrosa 1 1 203 4. ALFCP Chloroplast Medicago sativa 3 3 3460 5. AMDCP Chloroplast Acetabularia mediterranea 1 1 1175 6. AMFMT Mitochondrion Apis mellifera 1 1 2949 7. AMHCP Chloroplast Amaranthus hybridus 1 1 1187 8. AMTMT Mitochondrion Ambystoma tigrinum 2 1 225 9. ASNMT Mitochondrion Aspergillus amstelodami 1 1 624 10. ASNMT Mitochondrion Aspergillus nidulans 17 15 30144 11. ASTCP Chloroplast Avena sativa 1 1 1623 12. ATHCP Chloroplast Arabidopsis thaliana 3 2 1499 13. ATPCP Chloroplast Atriplex patula 1 1 1786 14. ATPCP Chloroplast Atriplex rosea 1 1 1790 15. BETMT Mitochondrion Beta vulgaris 3 3 4368 16. BLYCP Chloroplast Hordeum vulgare 14 9 18910 17. BOLCP Chloroplast Brassica oleracea 1 1 543 18. BOLMT Mitochondrion Brassica oleracea 1 1 549 19. BOLMT Mitochondrion Brassica sp. 2 2 770 20. BOVMT Mitochondrion Bos taurus 6 5 19563 21. CHECP Chloroplast Chenopodium album 1 1 207 22. CHKMT Mitochondrion Gallus gallus 2 1 225 23. CHLCP Chloroplast Chlorella ellipsoidea 11 8 11704 24. CHPMT Mitochondrion Pan troglodytes 1 1 896 25. CNAMT Mitochondrion Citrullus lanatus 1 1 4512 26. CODCP Chloroplast Codium fragile 4 4 304 27. CPACP Chloroplast Cyanophora paradoxa 9 7 1298 28. CPACY Cyanelle Cyanophora paradoxa 3 3 2964 29. CRECP Chloroplast Chlamydomonas moewusii 3 2 8107 30. CRECP Chloroplast Chlamydomonas reinhardtii 38 31 34870 31. CRECP Chloroplast Chlamydomonas sp. 3 2 2346 32. CREMT Mitochondrion Chlamydomonas reinhardtii 25 21 17665 33. CRRMT Mitochondrion Corcorax melanorhamphos 1 1 239 34. DARMT Mitochondrion Daucus carota 1 1 690 35. DIPMT Mitochondrion Dipodomys californicus 1 1 239 36. DIPMT Mitochondrion Dipodomys heermanni 1 1 239 37. DIPMT Mitochondrion Dipodomys panamintinus 2 2 478 38. DRMMT Mitochondrion Drosophila mauritania 1 1 976 39. DROMT Mitochondrion Drosophila melanogaster 8 5 7782 40. DRSMT Mitochondrion Drosophila simulans 1 1 975 41. DRVMT Mitochondrion Drosophila virilis 1 1 191 42. DRYMT Mitochondrion Drosophila yakuba 9 3 19938 43. EGRCP Chloroplast Euglena gracilis 33 22 48472 44. EQQMT Mitochondrion Equus quagga 2 2 229 45. FHEMT Mitochondrion Fasciola hepatica 2 1 708 46. FRGMT Mitochondrion Rana catesbeiana 5 3 7752 47. FSBMT Mitochondrion Acipenser transmontano 2 1 156 48. FSBMT Mitochondrion Cichlosoma centrarchus 1 1 239 49. FSBMT Mitochondrion Cichlosoma citrinellum 1 1 239 50. FSBMT Mitochondrion Cichlosoma labiatum 1 1 239 51. FSBMT Mitochondrion Cichlosoma nicaraguense 1 1 239 52. FSBMT Mitochondrion Cyprinus carpio 3 3 873 53. FSBMT Mitochondrion Julidochromis regani 1 1 239 54. FTRCP Chloroplast Flaveria bidentis 1 1 1839 55. FTRCP Chloroplast Flaveria pringlei 1 1 1842 56. GCOCP Chloroplast Gracilaria tenuistipitata 1 1 1930 57. GIBMT Mitochondrion Hylobates lar 1 1 896 58. GORMT Mitochondrion Gorilla gorilla 1 1 896 59. HAMMT Mitochondrion Cricetulus sp. 1 1 880 60. HUMMT Mitochondrion Homo sapiens 40 34 35919 61. IPBCP Chloroplast Ipomoea batatas 1 1 2004 62. LEIKP Kinetoplast Leishmania mexicana 3 3 2134 63. LEIKP Kinetoplast Leishmania tarentolae 21 15 27605 64. LEIMT Mitochondrion Leishmania tarentolae 1 1 189 65. LEMMT Mitochondrion Lemur catta 1 1 895 66. LMIMT Mitochondrion Locusta migratoria 2 2 2467 67. LUAMT Mitochondrion Lupinus angustifolius 2 2 1330 68. LUPMT Mitochondrion Lupinus luteus 2 1 630 69. MACMT Mitochondrion Macaca fascicularis 2 2 1598 70. MACMT Mitochondrion Macaca fuscata 1 1 896 71. MACMT Mitochondrion Macaca mulatta 1 1 896 72. MACMT Mitochondrion Macaca sylvanus 1 1 896 73. MPOCP Chloroplast Marchantia polymorpha 10 1 121024 74. MSQMT Mitochondrion Aedes albopictus 9 9 3448 75. MUSMT Mitochondrion Mus musculus 18 11 20366 76. MZECP Chloroplast Zea mays 68 52 49118 77. MZECP Chloroplast Zea perennis 2 2 1456 78. MZEMT Mitochondrion Zea mays 46 40 73673 79. NEUMT Mitochondrion Neurospora crassa 41 36 48308 80. NEUMT Mitochondrion Neurospora intermedia 3 2 6248 81. NRACP Chloroplast Neurachne munroi 1 1 1990 82. NRACP Chloroplast Neurachne tenuifolia 1 1 2010 83. OBECP Chloroplast Oenothera berteriana 6 4 1813 84. OBEMT Mitochondrion Oenothera berteriana 17 14 24163 85. OBOCP Chloroplast Oenothera odorata 2 2 964 86. OHOCP Chloroplast Oenothera hookeri 2 2 2132 87. OLICP Chloroplast Olisthodiscus luteus 1 1 714 88. ORAMT Mitochondrion Pongo pygmaeus 1 1 895 89. OSPMT Mitochondrion Oenothera sp. 2 2 1635 90. PALMT Mitochondrion Paracentrotus lividus 16 16 18381 91. PANMT Mitochondrion Podospora anserina 14 13 34888 92. PARMT Mitochondrion Paramecium aurelia 9 9 8110 93. PARMT Mitochondrion Paramecium primaurelia 4 3 5645 94. PARMT Mitochondrion Paramecium sp. 34 17 12563 95. PARMT Mitochondrion Paramecium tetraurelia 4 4 5844 96. PEACP Chloroplast Pisum sativum 31 26 42065 97. PEAMT Mitochondrion Pisum sativum 6 5 5429 98. PENCP Chloroplast Pennisetum americanum 2 1 325 99. PETCP Chloroplast Petunia hybrida 5 5 5461 100. PETMT Mitochondrion Petunia hybrida 4 3 2954 101. PETMT Mitochondrion Petunia parodii 1 1 1774 102. PIGMT Mitochondrion Sus scrofa 1 1 449 103. PILCP Chloroplast Pilayella littoralis 1 1 353 104. PMGMT Mitochondrion Placopecten magellanicus 2 2 2195 105. POGMT Mitochondrion Thomomys townsendi 1 1 239 106. PZOCP Chloroplast Pelargonium zonale 2 2 463 107. RADMT Mitochondrion Raphanus sativus 2 2 5752 108. RATMT Mitochondrion Rattus norvegicus 31 23 20938 109. RATMT Mitochondrion Rattus rattus 3 3 4217 110. RICCP Chloroplast Oryza sativa 10 8 11111 111. RICMT Mitochondrion Oryza sativa 5 5 8084 112. RYECP Chloroplast Secale cereale 9 7 9269 113. SAIMT Mitochondrion Saimiri sciureus 1 1 893 114. SALCP Chloroplast Sinapis alba 8 6 9874 115. SAOCP Chloroplast Saponaria officinalis 1 1 1252 116. SHRMT Mitochondrion Artemia salina 2 1 1137 117. SHRMT Mitochondrion Artemia sp. 2 1 1129 118. SLMMT Mitochondrion Physarum polycephalum 1 1 1536 119. SNICP Chloroplast Solanum nigrum 1 1 1501 120. SOLCP Chloroplast Spirodela oligorhiza 9 9 6538 121. SOYCP Chloroplast Glycine max 13 8 13023 122. SOYMT Mitochondrion Glycine max 5 5 8683 123. SPFMT Mitochondrion Spodoptera frugiperda 1 1 446 124. SPICP Chloroplast Spinacia oleracea 40 33 60853 125. SRGCP Chloroplast Sorghum bicolor 1 1 862 126. SRGMT Mitochondrion Sorghum sp. 4 2 4768 127. STFMT Mitochondrion Asterina pectinifera 1 1 3849 128. SUSMT Mitochondrion Strongylocentrotus drobachiensis 2 2 965 129. SUSMT Mitochondrion Strongylocentrotus franciscanus 3 3 1276 130. SUSMT Mitochondrion Strongylocentrotus intermedius 2 2 960 131. SUSMT Mitochondrion Strongylocentrotus pallidus 2 2 961 132. SUSMT Mitochondrion Strongylocentrotus purpuratus 3 3 1279 133. TARMT Mitochondrion Tarsius syrichta 1 1 895 134. TETMT Mitochondrion Tetrahymena thermophila 1 1 53 135. TEYMT Mitochondrion Tetrahymena pyriformis 13 12 12293 136. TOBCP Chloroplast Nicotiana acuminata 1 1 2052 137. TOBCP Chloroplast Nicotiana debneyi 3 3 4016 138. TOBCP Chloroplast Nicotiana otophora 1 1 2052 139. TOBCP Chloroplast Nicotiana plumbaginifolia 6 4 4169 140. TOBCP Chloroplast Nicotiana tabacum 47 40 200231 141. TOBMT Mitochondrion Nicotiana plumbaginifolia 2 1 1740 142. TOBMT Mitochondrion Nicotiana tabacum 4 3 4074 143. TOMMT Mitochondrion Lycopersicon esculentum 2 1 558 144. TRBKP Kinetoplast Trypanosoma brucei 26 20 36400 145. TRBMT Mitochondrion Trypanosoma brucei 4 4 2285 146. TRCKP Kinetoplast Trypanosoma cruzi 27 27 11864 147. TREKP Kinetoplast Trypanosoma equiperdum 2 2 2017 148. TRFKP Kinetoplast Crithidia fasciculata 19 18 12549 149. TRLKP Kinetoplast Leptomonas tarentolae 1 1 2568 150. VFACP Chloroplast Vicia faba 5 5 6821 151. VFAMT Mitochondrion Vicia faba 4 4 9356 152. WARMT Mitochondrion Pomatostomus isidori 1 1 239 153. WARMT Mitochondrion Pomatostomus ruficeps 1 1 239 154. WARMT Mitochondrion Pomatostomus superciliosus 1 1 239 155. WARMT Mitochondrion Pomatostomus temporalis 1 1 239 156. WHTCP Chloroplast Triticum aestivum 25 24 24129 157. WHTMT Mitochondrion Triticum aestivum 21 17 16161 158. XELMT Mitochondrion Xenopus laevis 5 4 24251 159. YSCMT Mitochondrion Saccharomyces cerevisiae 179 161 135852 160. YSGMT Mitochondrion Saccharomyces carlsbergensis 1 1 149 161. YSKMT Mitochondrion Kluyveromyces lactis 76 38 2955 162. YSKMT Mitochondrion Kluyveromyces thermotolerans 5 3 1287 163. YSLMT Mitochondrion Torulopsis glabrata 10 9 6200 164. YSPMT Mitochondrion Schizosaccharomyces pombe 8 8 11911 165. YSSMT Mitochondrion Cephalosporium acremonium 2 2 3029 166. YSTMT Mitochondrion Yeast sp. 4 4 5196 167. YSUMT Mitochondrion Candida utilis 1 1 306 168. YSVMT Mitochondrion Saccharomyces uvarum 3 3 2296 Total 1368 1110 1594294 BACTERIAL Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. ACC Acinetobacter calcoaceticus 8 5 16060 2. ACC Acinetobacter sp. 3 2 3298 3. ACH Achromobacter sp. 1 1 2414 4. ACL Acholeplasma laidlawii 1 1 1508 5. ACN Actinobacillus actinomycetemcomitans 1 1 3842 6. ACN Actinobacillus pleuropneumoniae 1 1 3831 7. ACO Acetogenium kivui 1 1 2477 8. ACY Actinomyces naeslundii 1 1 2160 9. ACY Actinomyces viscosus 2 1 1850 10. AFA Alcaligenes eutrophus 7 6 16706 11. AFA Alcaligenes faecalis 3 3 7781 12. AFA Plasmid pJP4 3 3 2361 13. AMC Acidaminococcus fermentans 2 1 3245 14. AMS Ampullariella sp. 1 1 1892 15. ANA Anabaena 7120 8 6 15235 16. ANA Anabaena sp. 24 17 22096 17. ANI Anacystis nidulans 31 25 35827 18. AQU Agmenellum quadruplicatum 3 3 5497 19. ARF Archaeoglobus fulgidus 3 2 1727 20. ARG Arthrobacter sp. 1 1 2075 21. ATU Agrobacterium rhizogenes 4 4 4868 22. ATU Agrobacterium sp. 1 1 1599 23. ATU Agrobacterium tumefaciens 31 27 54619 24. AVH Azotobacter chroococcum 1 1 1654 25. AVI Azotobacter vinelandii 21 17 73395 26. AZS Azospirillum brasilense 1 1 1910 27. BAD Bacillus caldolyticus 1 1 1150 28. BAL Bacillus caldotenax 4 4 4182 29. BAM Bacillus amyloliquefaciens 17 14 14120 30. BAN Bacillus anthracis 4 4 14401 31. BBR Bacillus brevis 10 9 21549 32. BCC Bacillus coagulans 1 1 1332 33. BCE Bacillus cereus 18 14 18719 34. BCI Bacillus circulans 7 4 8107 35. BCQ Bacillus Q 5 5 786 36. BLI Bacillus licheniformis 19 16 17553 37. BMA Bacillus macerans 1 1 2744 38. BME Bacillus megaterium 21 17 27133 39. BNG Bacteroides gingivalis 1 1 1420 40. BNO Bacteroides nodosus 5 4 3790 41. BNR Bacteroides fragilis 3 3 5828 42. BOR Borrelia burgdorferei 5 5 2497 43. BPE Bordetella bronchiseptica 1 1 4936 44. BPE Bordetella parapertussis 3 3 4749 45. BPE Bordetella pertussis 16 11 40804 46. BPO Bacillus polymyxa 3 2 5629 47. BPU Bacillus pumilus 11 8 6185 48. BRL Brevibacterium epidermidis 2 1 1721 49. BRL Brevibacterium lactofermentum 9 8 22049 50. BRU Brucella abortus 2 2 5253 51. BSN Bacillus natto 1 1 676 52. BSP Bacillus sp. 17 17 34195 53. BSS Bacillus sphaericus 10 8 16006 54. BST Bacillus stearothermophilus 31 31 44639 55. BSU Bacillus subtilis 226 183 289975 56. BTH Bacillus thuringiensis 60 46 134829 57. BTT Thermoactinomyces thalpophilus 2 2 1036 58. BUT Butyrivibrio fibrisolvens 1 1 3124 59. C1B Plasmid Colicin B4 3 3 1561 60. CAJ Campylobacter jejuni 4 4 5410 61. CB2 Plasmid Colicin B2 1 1 360 62. CCR Caulobacter crescentus 19 19 9279 63. CD1 Plasmid Colicin D 1 1 1099 64. CDC Caldocellum saccharolyticum 3 2 5961 65. CE1 Plasmid Colicin E1 40 30 15683 66. CE2 Plasmid Colicin E2 4 4 4553 67. CE3 Plasmid Colicin E3 1 1 392 68. CE5 Plasmid Colicin E5-099 1 1 1113 69. CE8 Plasmid Colicin E8 1 1 1268 70. CE9 Plasmid Colicin E9 1 1 1500 71. CEC Plasmid Colicin E3-CA38 8 3 4883 72. CEC Plasmid Colicin E6-CT14 1 1 3065 73. CFI Cellulomonas fimi 6 6 5348 74. CFR Citrobacter freundii 6 5 4522 75. CFX Chloroflexus aurantiacus 1 1 1223 76. CGF Chlorogloeopsis fritschii 1 1 210 77. CHT Chlamydia psittaci 3 3 2942 78. CHT Chlamydia trachomatis 25 14 47472 79. CIA Plasmid Colicin Ia 1 1 3727 80. CIB Plasmid Colicin Ib 4 4 8945 81. CIB Plasmid Colicin Ib-P9 1 1 528 82. CLA Plasmid Colicin A 2 2 3155 83. CLD Plasmid CloDF13 13 1 9957 84. CLK Plasmid Colicin K 2 2 815 85. CLN Plasmid Colicin V 1 1 412 86. CLN Plasmid Colicin V-K30 1 1 1465 87. CLN2 Plasmid Colicin V2-K94 1 1 550 88. CLO Clostridium acetobutylicum 6 5 8561 89. CLO Clostridium acidiurici 1 1 2266 90. CLO Clostridium botulinum 1 1 4835 91. CLO Clostridium cellulolyticum 1 1 2405 92. CLO Clostridium difficile 4 3 10451 93. CLO Clostridium innocuum 1 1 1544 94. CLO Clostridium pasteurianum 24 16 16972 95. CLO Clostridium perfringens 7 6 4288 96. CLO Clostridium sordellii 1 1 1504 97. CLO Clostridium tetani 3 3 10529 98. CLO Clostridium thermocellum 8 7 13762 99. CLO Clostridium thermohydrosulfuricum 1 1 4839 100. CLO Clostridium thermosulfurogenes 1 1 2824 101. CLV Plasmid ColVBtrp 1 1 441 102. CN2 Plasmid pCN2 1 1 117 103. CN3 Plasmid pCN3 1 1 114 104. COR Corynebacterium glutamicum 7 5 12942 105. COR Corynebacterium nephridii 1 1 615 106. COR Corynebacterium sp. 2 2 2512 107. COX Coxiella burnetii 2 2 4051 108. CPC Cryptococcus albidus 2 1 2984 109. CYA Cyanobacterium nostoc 2 2 4220 110. CYT Cytophaga lytica 1 1 1509 111. DEI Deinococcus radiodurans 2 2 4970 112. DMO Desulfurococcus mobilis 6 6 8080 113. DVU Desulfovibrio baculatus 2 1 2589 114. DVU Desulfovibrio gigas 1 1 2756 115. DVU Desulfovibrio vulgaris 8 8 9797 116. EAM Erwinia amylovora 1 1 772 117. ECA Erwinia carotovora 8 8 11566 118. ECB Erwinia herbicola 1 1 4902 119. ECH Erwinia chrysanthemi 15 10 21981 120. ECO Escherichia coli 1498 1032 1635830 121. ECO F sex factor plasmid 1 1 3635 122. ECO Plasmid Colicin BM-Cl139 3 3 3707 123. ECO Plasmid pCU1 1 1 2056 124. ECO Plasmid pF166 1 1 2133 125. EHP Ectothiorhodospira halophila 1 1 121 126. EHR Ehrlichia risticii 2 1 1498 127. EHV Ectothiorhodospira vacuolata 1 1 120 128. ENC Enterococcus faecium 1 1 1900 129. ENR Plasmid ENTR 2 2 1273 130. ENS Plasmid ENT 1 1 866 131. ENT Enterobacter aerogenes 6 6 5330 132. ENT Enterobacter agglomerans 1 1 383 133. ENT Enterobacter cloacae 6 6 7434 134. ETA Edwardsiella tarda 2 1 306 135. EUB Eubacterium sp. 3 3 1778 136. FA3 Plasmid pFA3 1 1 1597 137. FDI Fremyella diplosiphon 25 19 27562 138. FIB Fibrobacter succinogenes 2 2 4620 139. FPL Plasmid F 29 22 30781 140. FRA Frankia sp. 3 2 3758 141. FVB Flavobacterium heparinum 1 1 1528 142. FVB Flavobacterium okeanokoites 8 8 9873 143. FVB Flavobacterium sp. 2 2 3418 144. HAF Hafnia alvei 1 1 2961 145. HAL Halobacterium cutirubrum 4 4 3566 146. HAL Halobacterium halobium 32 23 37861 147. HAL Halobacterium salinarium 1 1 606 148. HAL Halobacterium sp. 7 6 14310 149. HAL Halobacterium volcanii 6 6 6860 150. HCL Heliobacterium chlorum 1 1 1512 151. HCU Halobacterium cutirubrum 1 1 3116 152. HEH Haemophilus haemolyticus 2 2 3186 153. HEI Haemophilus influenzae 8 8 11342 154. HEP Haemophilus parainfluenza 5 5 853 155. HMO Halococcus morrhuae 2 2 4402 156. HV2 Plasmid pHV2 1 1 6354 157. IM13 Plasmid pIM13 1 1 2246 158. INC Plasmid incB 1 1 352 159. INC Plasmid incI-1 1 1 418 160. INC Plasmid incI-gamma 1 1 417 161. INS Insertion sequence 10 10 4266 162. INS Insertion sequence IS1 3 2 1707 163. INS Insertion sequence IS150 2 1 1443 164. INS Insertion sequence IS186 2 2 2677 165. INS Insertion sequence IS2 3 3 387 166. INS Insertion sequence IS26 1 1 859 167. INS Insertion sequence IS30 1 1 1221 168. INS Insertion sequence IS4 1 1 1426 169. INS Insertion sequence IS476 1 1 1225 170. INS Insertion sequence IS493 1 1 1641 171. INS Insertion sequence IS5 3 2 1570 172. INS Insertion sequence IS891 1 1 1351 173. JD1 Plasmid pJD1 2 1 4207 174. KAE Klebsiella aerogenes 14 12 13177 175. KCI Kluyvera citrophila 1 1 2734 176. KPN Klebsiella pneumoniae 69 54 107538 177. KPN Plasmid pJHC-MW1 1 1 1352 178. KPO Klebsiella oxytoca 2 2 4901 179. LAC Lactococcus lactis 5 4 12831 180. LAE Listonella ordalii 2 1 120 181. LAE Listonella tubiashii 2 1 120 182. LB1 Plasmid p1 1 1 533 183. LB3 Lactobacillus 30a 4 2 2189 184. LBB Lactobacillus bulgaricus 1 1 536 185. LBD Lactobacillus delbrueckii 7 4 5405 186. LBH Lactobacillus helveticus 1 1 3292 187. LBP Lactobacillus plantarum 2 1 117 188. LBP Plasmid pC30il 1 1 2140 189. LBP Plasmid pLP1 1 1 2093 190. LCA Lactobacillus casei 6 6 9787 191. LCO Lactobacillus confusus 2 2 2640 192. LEP Leptospira biflexa 2 2 4788 193. LEP Leptospira interrogans 2 1 3244 194. LIS Listeria monocytogenes 3 2 3940 195. LPN Legionella pneumophila 1 1 239 196. LS1 Plasmid pLS11 1 1 253 197. MBA Methanobacterium ivanovii 2 1 1353 198. MBF Methanobacterium formicicum 1 1 3597 199. MBH Methanobrevibacter smithii 2 2 5713 200. MBI Methanobacterium thermoautotrophicum 7 5 20713 201. MBI Plasmid pME2001 1 1 1440 202. MBO Moraxella bovis 1 1 939 203. MEC Micromonospora echinospora 1 1 398 204. MEF Methanothermus fervidus 3 3 10822 205. MEH Methanospirillum hungatei 1 1 295 206. MES Methanosarcina barkeri 5 3 13117 207. MLC Methylococcus capsulatus 1 1 2463 208. MLU Micrococcus luteus 10 9 11753 209. MLY Micrococcus lysodeikticus 1 1 166 210. MPL Mycoplasma-like organism 1 1 1535 211. MSG Mycobacterium bovis 5 4 6104 212. MSG Mycobacterium leprae 5 4 10378 213. MSG Mycobacterium tuberculosis 14 8 17030 214. MSG Plasmid pAL5000 1 1 4837 215. MTB Methylobacterium extorquens 1 1 4500 216. MTB Methylobacterium specialis 2 1 2211 217. MV1 Plasmid pMV158 1 1 2436 218. MVA Methanococcus vannielii 15 13 22272 219. MVO Methanococcus voltae 10 9 12367 220. MVT Methanococcus thermolithotrophicus 3 2 2700 221. MXA Myxococcus xanthus 13 12 20095 222. MXB Lysobacter enzymogenes 3 2 3218 223. MYC Mycoplasma capricolum 14 13 21175 224. MYC Mycoplasma hyopneumoniae 2 2 1737 225. MYC Mycoplasma mycoides 4 4 2716 226. MYC Mycoplasma sp. 32 28 40860 227. MYC Plasmid pADB201 1 1 1717 228. NAH Plasmid NAH7 (from P. putida) 6 5 3771 229. NAT Natronobacterium pharaonis 1 1 1015 230. NGO Neisseria flavescens 1 1 1228 231. NGO Neisseria gonorrhoeae 61 53 48085 232. NGO Neisseria meningitidis 8 6 6438 233. NOC Nocardia mediterranei 3 3 450 234. NOS Nostoc commune 1 1 4241 235. NR1 Plasmid NR1 4 3 6463 236. NT1 Plasmid NTP1 2 2 1440 237. NT1 Plasmid NTP16 1 1 2730 238. P15 Plasmid P15A 2 2 1226 239. P18X Plasmid pACYC184 2 2 171 240. P23 Plasmid pMM2-3 2 2 182 241. P307 Plasmid P307 2 2 3852 242. P53 Plasmid pMM5-3 4 4 429 243. P55 Plasmid pMM5-5 4 4 420 244. PAC Plasmid P177 1 1 345 245. PAM Plasmid PAM177 1 1 1443 246. PAS Pasteurella haemolytica 3 3 15958 247. PAZ Plasmid pAZ1 1 1 808 248. PB0 Plasmid pUB110 8 7 8061 249. PB2 Plasmid pUB112 1 1 901 250. PBF4 Plasmid pBF4 1 1 1041 251. PC1 Plasmid pC194 2 2 3946 252. PC2 Plasmid pC221 2 1 4555 253. PDE Paracoccus denitrificans 9 7 17422 254. PDGO Plasmid pDGO100 1 1 2237 255. PDU Plasmid pDU1358 2 2 5076 256. PE1 Plasmid pE194 7 3 5039 257. PE2 Plasmid pED208 2 2 5640 258. PHL Plasmid pHly152 1 1 8215 259. PI25 Plasmid pI258 5 4 12140 260. PIJ Plasmid pIJ101 2 2 9188 261. PIP Plasmid pIP401 2 2 383 262. PIP11 Plasmid pIP1100 1 1 1386 263. PIP404 Plasmid pIP404 4 3 15188 264. PJH Plasmid pJH1 1 1 1489 265. PJM1 Plasmid pJM1 1 1 3581 266. PJR Plasmid PJR225 1 1 1527 267. PKM Plasmid pKM101 1 1 1797 268. PLB Plasmid pLB1 1 1 2190 269. PLM Plasmid pAA3.7X 3 1 9583 270. PME Plasmid pMEA100 1 1 150 271. PMM Plasmid pMM110 1 1 240 272. PMO Plasmid pMON234 1 1 997 273. PNE Plasmid pNE131 2 1 2355 274. PNS Plasmid pNS1 1 1 3879 275. PNS Plasmid pNS1981 4 3 1819 276. PO2 Plasmid pOAD2 2 2 2914 277. PR1 Plasmid R1 13 10 7500 278. PR2 Plasmid R1126 1 1 428 279. PR6 Plasmid R6-5 2 1 858 280. PRC Plasmid R 1 1 1487 281. PRI Plasmid PRI13 2 1 2234 282. PRM Morganella morganii 2 2 1831 283. PRM Proteus mirabilis 7 6 16319 284. PRM Proteus vulgaris 11 7 13986 285. PRO Providencia sp. 1 1 1135 286. PRO Providencia stuartii 1 1 3889 287. PRS Propionibacterium shermanii 1 1 439 288. PS1 Streptomyces lividans plasmid pS1 1 1 75 289. PSA Plasmid pSA2100 1 1 98 290. PSC Plasmid pSC101 14 8 15551 291. PSE Plasmid pCMS1 1 1 1322 292. PSE Pseudomonas aeruginosa 83 63 83148 293. PSE Pseudomonas amyloderamosa 5 2 4488 294. PSE Pseudomonas cepacia 2 2 5867 295. PSE Pseudomonas fluorescens 4 4 10441 296. PSE Pseudomonas fragi 2 2 1682 297. PSE Pseudomonas paucimobilis 1 1 1080 298. PSE Pseudomonas pseudoalcaligenes 1 1 2040 299. PSE Pseudomonas putida 18 16 39155 300. PSE Pseudomonas sp. 18 16 31393 301. PSE Pseudomonas syringae 9 9 24734 302. PSE Pseudomonas testosteroni 2 2 2435 303. PSE TOL Plasmid (from Pseudomonas putida) 10 6 7435 304. PSE Zoogloea ramigera 1 1 1524 305. PSM SYM megaplasmid(from R. meliloti) 9 8 5150 306. PSN Plasmid pSN2 1 1 1288 307. PT1 Plasmid pT181 6 3 5149 308. PTB Plasmid pTB913 1 1 1200 309. PWM Plasmid pWM5 1 1 569 310. PWP Plasmid pWP7b 1 1 1370 311. PWR Plasmid PWR60 1 1 4832 312. PYR Pyrodictium occultum 4 4 2077 313. R10 Plasmid R100 23 15 16779 314. R11 Plasmid R1162 4 4 1800 315. R12 Plasmid R124 1 1 272 316. R14 Plasmid R144 1 1 801 317. R27 Plasmid R27 1 1 1507 318. R36 Plasmid R386 1 1 441 319. R37 Plasmid R387 1 1 1160 320. R38 Plasmid R388 2 2 3204 321. R41 Plasmid R401 1 1 1857 322. R45 Plasmid R485 1 1 591 323. R46 Plasmid R46 3 3 2859 324. R48 Plasmid R483 1 1 1618 325. R53 Plasmid R538 3 2 1712 326. R65 Plasmid R65 2 2 1380 327. R67 Plasmid R67 1 1 293 328. R6K Plasmid R6K 7 6 1894 329. R75 Plasmid R751 2 2 1020 330. R77 Plasmid R773 1 1 4347 331. RA1 Plasmid RA1 1 1 758 332. RBH Plasmid pRBH1 2 2 1521 333. RBL Rhodopseudomonas blastica 1 1 12368 334. RCA Rhodobacter capsulatus 27 21 42777 335. REI Plasmid pRE-I 1 1 439 336. RGN Plasmid RGN238 2 1 2427 337. RHA Azorhizobium caulinodans 2 1 2809 338. RHB Bradyrhizobium japonicum 19 16 27704 339. RHF Rhizobium fredii 1 1 2862 340. RHH Rhizobium phaseoli 4 4 3681 341. RHI Bradyrhizobium sp. 2 2 6665 342. RHI Rhizobium sp. 6 5 7684 343. RHJ Rhizobium japonicum 10 8 10225 344. RHL Plasmid pRL1JI 5 1 12055 345. RHL Rhizobium leguminosarum 12 12 18644 346. RHM Rhizobium meliloti 43 38 68539 347. RHP Parasponia rhizobium 2 2 5530 348. RHR Rhizobium IRc78 2 2 2199 349. RHT Rhizobium trifolii 5 5 6886 350. RIA Plasmid Ri 1 1 21126 351. RIR Rickettsia conorii 1 1 539 352. RIR Rickettsia prowazekii 3 2 2971 353. RIR Rickettsia rickettsii 4 4 8555 354. RIR Rickettsia tsutsugamushi 1 1 2906 355. RIR Rickettsia typhi 2 2 1067 356. RIR Rochalimaea quintana 1 1 1493 357. RK2 Plasmid RK2 5 5 4723 358. ROS Roseburia cecicola 1 1 1031 359. RP1 Plasmid RP1 1 1 2709 360. RP4 Plasmid RP4 3 3 2327 361. RSF Plasmid RSF1010 4 4 10236 362. RSP Rhodospirillum rubrum 8 8 13464 363. RSS Rhodobacter sphaeroides 12 12 10427 364. RTS Plasmid Rts1 2 1 1855 365. RUA Ruminobacter amylophilus 2 1 2867 366. RUM Ruminococcus albus 1 1 2180 367. RVI Rhodopseudomonas viridis 5 4 3885 368. SA2 Plasmid pSAM2 3 3 866 369. SAC Sulfolobus acidocaldarius 4 4 5699 370. SAU Stigmatella aurantiaca 2 1 1300 371. SB2 Plasmid pSB24.2 1 1 3706 372. SCP Plasmid SCP1 1 1 2513 373. SER Saccharopolyspora erythraea 5 3 6464 374. SHD Shigella dysenteriae 7 7 11010 375. SHF Plasmid pMYSH6000 1 1 4472 376. SHF Shigella flexneri 7 7 14876 377. SHS Shigella sonnei 10 10 5138 378. SLP1 Plasmid SLP1 3 3 630 379. SMA Serratia marcescens 21 18 20645 380. SMA Serratia sp. 1 1 2570 381. SME Spiroplasma melliferum 1 1 1510 382. SMY Plasmid pSL2 1 1 345 383. SMY1 Plasmid pSL1 2 2 633 384. SPA Spirochaeta aurantia 1 1 1257 385. SPO Sporolactobacillus laevis 1 1 118 386. SPO Sporosarcina ureae 2 1 116 387. SSO Sulfolobus shibatae 1 1 1495 388. SSO Sulfolobus solfataricus 4 4 3319 389. SSP Sulfolobus sp. 7 4 19617 390. STA Plasmid pT48 1 1 2475 391. STA Staphylococcus aureus 67 50 79187 392. STA Staphylococcus carnosus 1 1 720 393. STA Staphylococcus epidermidis 1 1 423 394. STA Staphylococcus haemolyticus 1 1 1087 395. STA Staphylococcus hyicus 1 1 2212 396. STA Staphylococcus mutans 1 1 2288 397. STA Staphylococcus simulans 1 1 1486 398. STA Staphylococcus staphylolyticus 1 1 1825 399. STM Streptomyces antibioticus 1 1 1567 400. STM Streptomyces avidinii 1 1 638 401. STM Streptomyces azureus 1 1 1521 402. STM Streptomyces clavuligerus 2 2 2411 403. STM Streptomyces coelicolor 12 10 16265 404. STM Streptomyces fradiae 5 5 10504 405. STM Streptomyces glaucescens 4 3 3466 406. STM Streptomyces griseus 11 8 18202 407. STM Streptomyces hygroscopicus 6 6 5166 408. STM Streptomyces lavendulae 1 1 1130 409. STM Streptomyces limosus 1 1 2291 410. STM Streptomyces lividans 25 20 14900 411. STM Streptomyces plicatus 2 2 1245 412. STM Streptomyces rochei 2 2 1195 413. STM Streptomyces sp. 29 25 27226 414. STM Streptomyces thermotolerans 1 1 1260 415. STM Streptomyces vinaceus 1 1 1119 416. STR Plasmid pAM-beta-1 2 2 901 417. STR Plasmid pMK157 1 1 1920 418. STR Streptococcus equisimilis 1 1 2568 419. STR Streptococcus faecalis 6 6 8343 420. STR Streptococcus lactis 8 7 8222 421. STR Streptococcus mutans 10 9 35982 422. STR Streptococcus pneumoniae 46 32 50182 423. STR Streptococcus pyogenes 20 15 31675 424. STR Streptococcus sanguis 2 2 6639 425. STR Streptococcus sobrinus 1 1 4995 426. STR Streptococcus sp. 9 7 13788 427. STY Plasmid R1767 1 1 1519 428. STY Plasmid R64 1 1 482 429. STY Salmonella infantis 1 1 3430 430. STY Salmonella potsdam 1 1 1727 431. STY Salmonella rubislaw 1 1 1479 432. STY Salmonella sp. 7 6 6050 433. STY Salmonella typhimurium 143 119 172174 434. SYC Synechocystis sp. 10 8 13968 435. SYN Synechococcus sp. 13 13 35842 436. TBA Thermophilic bacterium 6 3 11617 437. TDT Trichodesmium thiebautii 1 1 357 438. TFE Plasmid pTF-FC2 1 1 329 439. TFE Thiobacillus acidophilus 9 4 599 440. TFE Thiobacillus ferrooxidans 7 5 10091 441. THA Thermoplasma acidophilum 2 2 4789 442. THC Thermococcus celer 2 2 1312 443. THF Thermomonospora fusca 1 1 264 444. THP Thermofilum pendens 1 1 240 445. TIP Plasmid pTiA6 1 1 345 446. TIP Plasmid pTiB6S3 1 1 4203 447. TIP Plasmid pTiC58 2 2 3120 448. TIP Plasmid Ti (from A. tumefaciens) 59 50 89960 449. TMO Thermotoga maritima 2 2 2763 450. TRN Transposon gamma-delta 6 3 1092 451. TRN Transposon Tn3411 2 1 2925 452. TRN Transposon Tn5 1 1 2040 453. TRN Transposon Tn501 1 1 86 454. TRN Transposon Tn602 4 4 639 455. TRN10 Transposon Tn10 11 4 6024 456. TRN15 Transposon Tn1525 1 1 1721 457. TRN16 Transposon Tn1681 1 1 658 458. TRN17 Transposon Tn1721 8 8 1797 459. TRN1771 Transposon Tn1771 3 3 348 460. TRN21 Transposon Tn21 3 3 5639 461. TRN25 Transposon Tn2501 1 1 1539 462. TRN26 Transposon Tn2680 1 1 194 463. TRN3 Transposon Tn3 10 8 6351 464. TRN34 Transposon Tn3411 1 1 1321 465. TRN43 Transposon Tn4351 2 1 1982 466. TRN431 Transposon Tn431 3 3 2405 467. TRN4551 Transposon Tn4551 1 1 2080 468. TRN4556 Transposon Tn4556 2 2 86 469. TRN5 Transposon Tn5 9 6 4978 470. TRN501 Transposon Tn501 4 4 7310 471. TRN554 Transposon Tn554 5 1 6691 472. TRN7 Transposon Tn7 7 7 7535 473. TRN9 Transposon Tn9 2 2 1362 474. TRN903 Transposon Tn903 6 6 6118 475. TRN917 Transposon Tn917 3 3 6353 476. TRNCAM Transposon Tn-Cam204 1 1 921 477. TRP Treponema pallidum 6 6 8045 478. TTE Thermoproteus tenax 8 8 4399 479. TTH Thermus aquaticus 5 4 4831 480. TTH Thermus caldophilus 1 1 1229 481. TTH Thermus flavus 1 1 1771 482. TTH Thermus thermophilus 18 11 16501 483. URE Ureaplasma urealyticum 1 1 3424 484. VCH Vibrio cholerae 11 10 9588 485. VIB Aeromonas hydrophila 7 6 7237 486. VIB Aeromonas sobria 2 1 2510 487. VIB Photobacterium leiognathi 3 2 4041 488. VIB Photobacterium sp. 5 5 8715 489. VIB Vibrio alginolyticus 2 2 3975 490. VIB Vibrio fischeri 4 4 5791 491. VIB Vibrio harveyi 13 12 15766 492. VIB Vibrio parahaemolyticus 1 1 1275 493. VIB Vibrio sp. 3 2 1390 494. VIT Vitreoscilla sp. 2 1 689 495. VIT Vitreoscilla stercoraria 1 1 745 496. WP1 Plasmid pWP113a 1 1 1316 497. WP1 Plasmid pWP116a 1 1 1336 498. WP1 Plasmid pWP14a 1 1 1336 499. XAA Xanthobacter autotrophicus 1 1 3041 500. XAN Xanthomonas campestris 1 1 250 501. YEP Plasmid pYV03 1 1 3316 502. YEP Yersinia enterocolitica 4 4 8728 503. YEP Yersinia pestis 1 1 1397 504. YEP Yersinia pseudotuberculosis 5 4 10186 505. ZMO Zymomonas mobilis 8 7 12200 Total 4502 3508 5669333 STRUCTURAL RNA Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. AAU Auricularia auricula-judae 1 1 118 2. ACA Acanthamoeba castellanii 2 2 281 3. ACC Acinetobacter calcoaceticus 1 1 116 4. ACH Achromobacter cycloclastes 1 1 120 5. ACH Achromobacter xylosoxidans 1 1 114 6. ACL Acholeplasma entomophilum 1 1 1476 7. ACL Acholeplasma modicum 1 1 1473 8. ACN Actinobacillus actinomycetemcomitans 3 3 494 9. ACN Actinobacillus equuli 2 2 445 10. ACN Actinobacillus hominis 3 3 494 11. ACN Actinobacillus lignieresii 2 2 445 12. ACS Avian sarcoma virus 1 1 75 13. ACY Actinomyces pyogenes 2 1 1361 14. AED Agaricus edulis 1 1 118 15. AEQ Actinia equina 2 1 120 16. AFA Alcaligenes eutrophus 1 1 1511 17. AFA Alcaligenes faecalis 6 6 3410 18. AKK Akkesiphycus lubricum 2 1 118 19. ALF Medicago sativa 1 1 119 20. ALL Asteroleplasma anaerobium 1 1 1471 21. AMFMT Mitochondrion Apis mellifera 1 1 1266 22. AMG Acyrthosiphon magnoliae 2 2 281 23. AMP Amoeba proteus 1 1 419 24. ANI Anacystis nidulans 4 4 371 25. ANM Anisodoris nobilis 6 3 994 26. ANP Anaeroplasma abactoclasticum 1 1 1453 27. ANP Anaeroplasma bactoclasticum 1 1 1436 28. ANP Anaeroplasma varium 1 1 1436 29. APE Acremonium persicinum 1 1 119 30. APL Aplysia kurodai 1 1 119 31. APN Anthoceros punctatus 1 1 118 32. APR Antheraea pernyi 1 1 120 33. APU Aeromonas punctata 1 1 109 34. AQU Agmenellum quadruplicatum 1 1 76 35. ARB Arbacia punctulata 9 3 1049 36. ARG Arthrobacter globiformis 4 3 1774 37. ARG Arthrobacter luteus 1 1 122 38. ARG Arthrobacter oxidans 2 1 121 39. ARG Arthrobacter sp. 2 1 121 40. ARN Argulus nobilis 1 1 1843 41. ARO Arhodomonas oleiferhydrans 1 1 1487 42. ASC Acinetospora crinita 2 1 118 43. ASE Aquaspirillum serpens 1 1 116 44. ASF Aspergillus flavus 1 1 119 45. ASG Aspergillus niger 1 1 119 46. ASN Aspergillus nidulans 4 4 476 47. ATU Agrobacterium tumefaciens 1 1 120 48. AUT Aureobacterium testaceum 1 1 120 49. AVI Azotobacter vinelandii 1 1 120 50. AXY Amphibacillus xylanus 2 1 116 51. BAC Bacillus acidocaldarius 1 1 117 52. BAE Batrachospermum ectocarpum 1 1 121 53. BBR Bacillus brevis 2 2 1674 54. BEACP Chloroplast Phaseolus vulgaris 5 5 409 55. BFI Bacillus firmus 1 1 116 56. BGA Blue Green Algae 1 1 76 57. BGL Bacillus globigii 1 1 116 58. BHA Beneckea harveyi 1 1 122 59. BJA Blepharisma japonicum 2 2 476 60. BLI Bacillus licheniformis 1 1 116 61. BLY Hordeum vulgare 7 5 342 62. BLYCP Chloroplast Hordeum vulgare 1 1 76 63. BME Bacillus megaterium 1 1 116 64. BMO Bombyx mori 13 12 1243 65. BNA Brassica napus 2 2 196 66. BNC Bacteroides asaccharolyticus 1 1 48 67. BNG Bacteroides gingivalis 1 1 53 68. BNI Bacteroides intermedius 1 1 52 69. BOV Bos taurus 18 15 1170 70. BOVMT Mitochondrion Bos taurus 14 12 858 71. BPA Bacillus pasteurii 1 1 117 72. BPL Brachionus plicatilis 1 1 121 73. BRA Branchiostoma belcheri 1 1 120 74. BRA Branchiostoma californiense 6 3 974 75. BRL Brevibacterium helvolum 2 1 120 76. BRL Brevibacterium linens 1 1 123 77. BRP Brugia pahangi 1 1 363 78. BRU Brucella abortus 2 1 1429 79. BSI Blastocladiella simplex 1 1 118 80. BST Bacillus stearothermophilus 7 7 568 81. BSU Bacillus subtilis 16 14 1153 82. BVO Bresslaua vorax 1 1 120 83. BVU Beta vulgaris 1 1 120 84. CAO Carpopeltis crispata 1 1 121 85. CAU Caulobacter spinosum 1 1 117 86. CBC Caseobacter polymorphus 1 1 121 87. CCI Coprinus cinereus 1 1 118 88. CCO Crypthecodinium cohnii 4 4 492 89. CEL Caenorhabditis elegans 9 7 713 90. CFI Cellulomonas biazotea 1 1 120 91. CHA Chaetopterus sp. 6 3 975 92. CHB Chlorobium limicola 1 1 1504 93. CHF Chordaria flagelliformis 2 1 118 94. CHK Gallus gallus 25 23 2774 95. CHL Chlorella pyrenoidosa 2 1 119 96. CHL Chlorella sp. 5 3 2082 97. CHO Chilomonas paramecium 1 1 124 98. CHR Chromobacterium fluviatile 1 1 1473 99. CHR Chromobacterium violaceum 1 1 1475 100. CLM Spisula solidissima 6 3 937 101. CLO Clostridium aminovalericum 2 1 1554 102. CLO Clostridium barkeri 2 1 1527 103. CLO Clostridium carnis 2 1 117 104. CLO Clostridium pasteurianum 3 2 1628 105. CLO Clostridium ramosum 1 1 1530 106. CLO Clostridium sticklandii 2 2 1501 107. CODCP Chloroplast Codium fragile 1 1 94 108. COR Corynebacterium aquaticum 1 1 120 109. COR Corynebacterium glutamicum 1 1 121 110. COR Corynebacterium sp. 2 1 1366 111. COR Corynebacterium xerosis 2 2 243 112. COT Gossypium hirsutum 1 1 118 113. COX Coxiella burnetii 2 1 1484 114. CPA Cyanophora paradoxa 2 2 237 115. CRA Coprinus radiatus 1 1 118 116. CRB Limulus polyphemus 6 3 977 117. CRE Chlamydomonas reinhardtii 3 3 399 118. CRE Chlamydomonas sp. 1 1 118 119. CRS Cryptochiton stelleri 6 3 923 120. CTU Coleosporium tussilaginis 1 1 118 121. CUR Curtobacterium citreum 1 1 122 122. CVN Chromatium vinosum 1 1 1526 123. CYR Cycas revoluta 1 1 120 124. CYRCP Chloroplast Cycas revoluta 1 1 122 125. DAC Dryopteris acuminata 1 1 121 126. DACCP Chloroplast Dryopteris acuminata 2 2 225 127. DDE Dacrymyces deliquescens 1 1 118 128. DDI Dictyostelium discoideum 6 5 1051 129. DIT Diatoma tenue 1 1 118 130. DJA Dugesia japonica 1 1 120 131. DJA Dugesia tigrina 6 3 962 132. DOG Canis lupus 1 1 149 133. DOG Canis sp. 2 2 191 134. DRO Drosophila melanogaster 40 34 5122 135. DSA Desulfuromonas acetoxidans 1 1 1522 136. DSM Desulfomonile tiedjei 1 1 1505 137. DSP Desulfobacter postgatei 1 1 1519 138. DSV Desulfosarcina variabilis 1 1 1527 139. DUK Cairina moschata 1 1 78 140. DVU Desulfovibrio vulgaris 1 1 120 141. EAL Enchytraeus albidus 1 1 120 142. EAR Equisetum arvense 2 1 120 143. EBI Eisenia bicyclis 1 1 118 144. ECO Escherichia coli 126 99 10602 145. EGR Euglena gracilis 4 4 391 146. EGRCP Chloroplast Euglena gracilis 1 1 76 147. EHP Ectothiorhodospira halophila 1 1 1494 148. EIK Eikenella corrodens 4 4 5933 149. EJA Entosphenus japonicus 2 2 241 150. EMP Emplectonema gracile 2 2 239 151. ERL Erythrobacter longus 1 1 119 152. ERP Protomonas extorquens 1 1 116 153. ERY Erysipelothrix rhusiopathiae 1 1 1487 154. ESE Endophyllum sempervivi 1 1 118 155. ESP Euphausia sperba 1 1 75 156. EUT Eucidaris tribuloides 6 3 923 157. EVA Exobasidium vaccinii 1 1 118 158. EWO Euplotes woodruffi 1 1 120 159. FAE Faenia rectivirgula 1 1 1246 160. FSB Misgurnus fossilis 3 3 399 161. FSB Oncorhynchus keta 1 1 75 162. FSB Salmo gairdneri 2 2 282 163. GBI Ginkgo biloba 1 1 120 164. GCL Gymnosporangium clavariaeforme 1 1 118 165. GCO Gracilaria compressa 3 2 242 166. GEA Gelidium amansii 2 2 241 167. GLA Giardia lamblia 1 1 127 168. GLC Gloiopeltis complanata 1 1 120 169. GOL Golfingia gouldii 6 3 973 170. HAL Halobacterium volcanii 48 38 4377 171. HAM Mesocricetus sp. 2 1 94 172. HAMMT Mitochondrion Mesocricetus sp. 4 4 413 173. HAP Halichondria panicea 1 1 120 174. HAZ Haemophilus aphrophilus 3 3 494 175. HCU Halobacterium cutirubrum 15 13 1050 176. HDI Hymenolepis diminuta 2 2 215 177. HEA Haemophilus aegypticus 1 1 116 178. HEI Haemophilus influenzae 2 2 445 179. HJA Halichondria japonica 1 1 120 180. HLF Haloferax mediterranei 2 1 123 181. HMO Halococcus morrhuae 2 2 309 182. HOC Haliclona oculata 1 1 120 183. HRO Halocynthia roretzi 2 1 119 184. HSA Hymeniacidon sanguinea 2 2 276 185. HUM Homo sapiens 46 42 6065 186. HUMMT Mitochondrion Homo sapiens 1 1 62 187. HYD Hydra sp. 6 3 966 188. HYF Hydrurus foetidus 1 1 118 189. IGU Iguana iguana 1 1 120 190. JLA Aurelia aurita 2 2 240 191. JLC Chrysaora quinquecirrha 1 1 120 192. JLN Nemopsis dofleini 1 1 120 193. JLS Spirocodon saltatrix 1 1 121 194. JSUCP Chloroplast Jungermannia subulata 2 2 219 195. KIN Kingella denitrificans 1 1 1475 196. KIN Kingella kingae 1 1 1476 197. LAE Listonella aestuarianus 1 1 119 198. LAN Lingula anatina 1 1 119 199. LAN Lingula reevi 6 3 919 200. LAP Lamprometra palmata 9 3 1044 201. LBR Lactobacillus brevis 1 1 117 202. LCA Lactobacillus casei 2 1 1574 203. LCA Lactobacillus catenaforme 1 1 1549 204. LEI Leishmania enriettii 1 1 68 205. LGE Lineus geniculatus 1 1 120 206. LGICP Chloroplast Lemna sp. 1 1 121 207. LHE Lophocolea heterophylla 1 1 119 208. LPN Legionella pneumophila 11 10 1252 209. LSY Leptosynapta inhaerens 6 3 1051 210. LUM Lumbricus sp. 6 3 976 211. LUP Lupinus luteus 5 5 380 212. LVI Lactobacillus viridescens 2 2 234 213. LVI Lactobacillus vitulinus 1 1 1477 214. LYC Lycopodium clavatum 1 1 121 215. MAG Methylomonas methanica 1 1 1283 216. MBI Methanobacterium thermoautotrophicum 2 2 156 217. MET Metridium senile 6 3 963 218. MGL Metasequoia glyptostroboides 1 1 120 219. MJU Microstroma juglandis 1 1 121 220. MLC Methylococcus capsulatus 1 1 1234 221. MLM Moloney murine leukemia virus 1 1 74 222. MLU Micrococcus luteus 2 2 238 223. MLY Micrococcus lysodeikticus 1 1 120 224. MNI Mnium rugicum 2 1 157 225. MPO Marchantia polymorpha 1 1 119 226. MPOCP Chloroplast Marchantia polymorpha 34 34 2630 227. MSE Megasphaera elsdenii 1 1 1567 228. MSG Mycobacterium asiaticum 2 1 1368 229. MSG Mycobacterium aurum 2 1 1349 230. MSG Mycobacterium avium 4 2 2735 231. MSG Mycobacterium chelonei 2 1 1355 232. MSG Mycobacterium chitae 2 1 1359 233. MSG Mycobacterium fallax 2 1 1348 234. MSG Mycobacterium flavescens 2 1 1357 235. MSG Mycobacterium gordonae 2 1 1373 236. MSG Mycobacterium kansasii 2 1 1369 237. MSG Mycobacterium leprae 1 1 313 238. MSG Mycobacterium neoaurum 2 1 1354 239. MSG Mycobacterium nonchromogenicum 2 1 1376 240. MSG Mycobacterium paratuberculosis 2 1 1367 241. MSG Mycobacterium phlei 2 1 1357 242. MSG Mycobacterium senegalense 2 1 1356 243. MSG Mycobacterium smegmatis 1 1 77 244. MSG Mycobacterium sp. 4 2 2715 245. MSG Mycobacterium terrae 2 1 1363 246. MSG Mycobacterium thermoresistible 2 1 1359 247. MSG Mycobacterium triviale 2 1 1351 248. MSG Mycobacterium tuberculosis 1 1 116 249. MSL Mytilus edulis 1 1 119 250. MSQMT Mitochondrion Aedes albopictus 4 4 277 251. MTB Methylobacterium extorquens 1 1 1353 252. MTB Methylobacterium organophilum 1 1 1316 253. MTB Methylobacterium sp. 1 1 1052 254. MTE Methylosporovibrio methanica 1 1 1306 255. MUS Mus musculus 37 37 4353 256. MYA Mya arenaria 6 3 927 257. MYC Mycoplasma capricolum 3 3 259 258. MYC Mycoplasma hyopneumoniae 3 3 1799 259. MYC Mycoplasma mycoides 7 6 1885 260. MYC Mycoplasma sp. 24 24 34006 261. MYL Methylosinus trichosporium 1 1 1456 262. MYM Methylophilus methylotrophus 1 1 1504 263. MYP Methylocystis parvus 1 1 1314 264. MZECP Chloroplast Zea mays 1 1 75 265. NDU Nematospiroides dubius 1 1 360 266. NEM Ascaris suum 22 22 1251 267. NEU Neurospora crassa 5 5 486 268. NEUMT Mitochondrion Neurospora crassa 8 8 635 269. NIF Nitella flexilis 1 1 121 270. NIT Nitrobacter winogradskyi 1 1 117 271. OCE Oceanospirillum linum 1 1 1542 272. ONG Onchocerca gibsoni 1 1 363 273. OPW Ophiocoma wendtii 9 3 1036 274. PAR Paramecium caudatum 1 1 366 275. PAR Paramecium primaurelia 1 1 366 276. PAR Paramecium tetraurelia 1 1 120 277. PAS Pasteurella multocida 2 2 445 278. PBL Phycomyces blakesleeanus 1 1 120 279. PBR Perinereis brevicirris 1 1 120 280. PCL Prochloron sp. 1 1 122 281. PCR Philosamia cynthia ricini 2 2 289 282. PDE Paracoccus denitrificans 1 1 117 283. PEA Pisum sativum 8 8 824 284. PEC Penicillium chrysogenum 1 1 119 285. PEP Penicillium patulum 1 1 119 286. PFA Plasmodium falciparum 1 1 78 287. PGO Phascolopsis gouldii 1 1 120 288. PHS Phasianus colchicus 1 1 95 289. PHV Phaseolus vulgaris 3 3 228 290. PHY Pythium hydnosporum 1 1 118 291. PIL Pilayella littoralis 1 1 118 292. PIR Phlyctochytrium irregulare 1 1 118 293. PIS Pimelobacter simplex 1 1 120 294. PIV Pivellula marina 2 1 2885 295. PLC Planococcus citreus 2 1 116 296. PLC Planococcus kocurii 2 1 116 297. PMC Pneumocystis carinii 1 1 120 298. PMI Prorocentrum micans 1 1 364 299. PNC Pseudonocardia thermophila 1 1 1246 300. PNU Psilotum nudum 1 1 121 301. POR Porocephalus crotali 1 1 1830 302. POS Pleurotus ostreatus 1 1 118 303. PPO Puccinia poarum 1 1 118 304. PRE Planocera reticulata 1 1 120 305. PRM Proteus vulgaris 4 4 1925 306. PSE Pseudomonas aeruginosa 1 1 120 307. PSE Pseudomonas cepacia 2 2 1589 308. PSE Pseudomonas fluorescens 2 1 120 309. PT4 Bacteriophage T4 17 12 979 310. PT5 Bacteriophage T5 6 6 472 311. PTE Porphyra tenera 1 1 121 312. PTR Plagiomnium trichomanes 1 1 119 313. PYE Porphyra yezoensis 1 1 121 314. QUL Coturnix coturnix 1 1 136 315. RAB Oryctolagus cuniculus 15 11 4955 316. RAT Rattus norvegicus 58 43 5867 317. RAT Rattus rattus 4 4 230 318. RATMT Mitochondrion Rattus norvegicus 4 4 283 319. RCA Rhodobacter capsulatus 2 2 235 320. RCY Russula cyanoxantha 1 1 119 321. RER Rhodococcus equi 2 1 1360 322. RER Rhodococcus erythropolis 1 1 121 323. RHC Rhizoctonia crocorum 1 1 119 324. RHZ Rhizoctonia hiemalis 1 1 119 325. RIC Oryza sativa 1 1 367 326. RIF Riftia pachyptila 6 3 929 327. RIR Rickettsia rickettsii 2 1 1443 328. RIR Rickettsia typhi 2 1 1444 329. RMA Rhodopseudomonas marina 1 1 1417 330. RPA Rhodopseudomonas palustris 1 1 119 331. RRU Rhodospirillum rubrum 2 2 161 332. RSP Rhodospirillum rubrum 1 1 1446 333. RSS Rhodobacter sphaeroides 1 1 115 334. RTO Rhabditis tokai 1 1 119 335. RYE Secale cereale 2 2 238 336. SAC Sulfolobus acidocaldarius 2 2 204 337. SAG Schizochytrium aggregatum 1 1 119 338. SAU Stigmatella aurantiaca 2 2 239 339. SCL Styela clava 6 3 967 340. SCM Schistosoma mansoni 2 2 215 341. SCS Saccharopolyspora hirsuta 1 1 1284 342. SCU Thyone briareus 6 3 1059 343. SFE Saprolegnia ferax 1 1 118 344. SFU Sargassum fulvellum 1 1 118 345. SHE Shewanella hanedai 2 1 120 346. SHP Ovis sp. 1 1 76 347. SHR Artemia salina 2 2 282 348. SJA Sabellastarte japonica 1 1 120 349. SLI Synechococcus lividus 1 1 120 350. SLM Physarum polycephalum 6 5 845 351. SME Spiroplasma sp. 11 11 14826 352. SNL Arion rufus 3 2 276 353. SNL Helix pomatia 1 1 119 354. SOB Scenedesmus obliquus 5 5 407 355. SOBCP Chloroplast Scenedesmus obliquus 2 2 162 356. SOF Sepia officinalis 2 1 120 357. SOS Stichopus oshimae 1 1 120 358. SOYCP Chloroplast Glycine max 3 3 255 359. SPI Spinacia oleracea 1 1 120 360. SPICP Chloroplast Spinacia oleracea 12 12 998 361. SPM Spirobolus marginatus 6 3 977 362. SPO Sporolactobacillus inulinus 1 1 117 363. SPS Spirogyra sp. 1 1 120 364. SQD Illex illecebrosus 1 1 120 365. SSO Sulfolobus solfataricus 1 1 126 366. SSP Sulfolobus sp. 1 1 131 367. STA Staphylococcus aureus 1 1 115 368. STA Staphylococcus epidermidis 5 3 264 369. STC Stentor coeruleus 1 1 353 370. STE Stella humosa 1 1 117 371. STF Asteria amurensis 2 2 195 372. STF Asterias forbesi 9 3 1045 373. STF Asterina pectinifera 1 1 120 374. STM Streptomyces griseus 1 1 120 375. STR Streptococcus cremoris 1 1 117 376. STR Streptococcus faecalis 1 1 117 377. STR Streptococcus sp. 1 1 1577 378. STY Salmonella typhimurium 6 5 382 379. SUD Pseudocentrotus depressus 1 1 120 380. SUE Heliocidaris erythrogramma 6 3 1043 381. SUE Heliocidaris tuberculata 6 3 910 382. SUH Hemicentrotus pulcherrimus 1 1 120 383. SUL Lytechinus pictus 6 3 1046 384. SUS Strongylocentrotus purpuratus 6 3 988 385. SYB Syntrophospora bryantii 1 1 1532 386. SYC Synechocystis sp. 1 1 76 387. SYN Synechococcus lividus 1 1 119 388. SYW Syntrophomonas wolfei 1 1 1532 389. TAM Tatlockia micdadei 9 9 1286 390. TAN Tilletiaria anomala 1 1 118 391. TCO Tilletiaria controversa 1 1 118 392. TET Tetrahymena thermophila 4 4 390 393. TEY Tetrahymena pyriformis 3 3 623 394. TEYMT Mitochondrion Tetrahymena pyriformis 2 2 148 395. TFE Acidiphilium cryptum 1 1 122 396. TFE Thiobacillus acidophilus 1 1 120 397. TFE Thiobacillus ferrooxidans 2 2 240 398. TFE Thiobacillus intermedius 1 1 117 399. TFE Thiobacillus neapolitanus 1 1 119 400. TFE Thiobacillus novellus 1 1 120 401. TFE Thiobacillus perometabolis 1 1 116 402. TFE Thiobacillus sp. 1 1 117 403. TFE Thiobacillus thiooxidans 1 1 121 404. TFE Thiobacillus thioparus 1 1 118 405. TFE Thiobacillus versutus 1 1 116 406. TFE Thiomicrospira pelophila 1 1 118 407. TFE Thiomicrospira sp. 1 1 117 408. THA Thermoplasma acidophilum 3 3 276 409. THC Thermococcus celer 2 2 1611 410. TLA Thermomyces lanuginosus 2 2 276 411. TLP Torulopsis utilis 1 1 121 412. TOB Nicotiana tabacum 2 2 152 413. TOBCP Chloroplast Nicotiana tabacum 3 3 343 414. TOR Trichosporon oryzae 1 1 118 415. TRB Trypanosoma brucei 2 2 106 416. TRF Crithidia fasciculata 7 7 1305 417. TRI Trichomonas vaginalis 1 1 341 418. TTH Thermus aquaticus 2 2 243 419. TTH Thermus sp. 2 2 243 420. TTH Thermus thermophilus 5 4 354 421. TVI Thraustochytrium visurgense 1 1 119 422. UPE Ulva pertusa 1 1 120 423. URE Ureaplasma urealyticum 1 1 1464 424. UUN Urechis unicinctus 1 1 120 425. VCH Vibrio cholerae 1 1 119 426. VER Verrucomicrobium spinosum 1 1 116 427. VFA Vicia faba 2 2 327 428. VIB Aeromonas hydrophila 1 1 118 429. VIB Aeromonas media 1 1 119 430. VIB Aeromonas salmonicida 1 1 119 431. VIB Alteromonas colwelliana 2 1 120 432. VIB Alteromonas putrifaciens 1 1 120 433. VIB Photobacterium angustum 1 1 120 434. VIB Photobacterium leiognathi 1 1 120 435. VIB Photobacterium sp. 1 1 120 436. VIB Plesiomonas shigelloides 1 1 120 437. VIB Vibrio alginolyticus 1 1 121 438. VIB Vibrio anguillarum 1 1 120 439. VIB Vibrio carchariae 1 1 120 440. VIB Vibrio cincinnatii 1 1 120 441. VIB Vibrio damsela 1 1 120 442. VIB Vibrio fischeri 1 1 120 443. VIB Vibrio fluvialis 1 1 120 444. VIB Vibrio gazogenes 1 1 120 445. VIB Vibrio logei 1 1 120 446. VIB Vibrio marinus 1 1 118 447. VIB Vibrio metschnitovii 1 1 120 448. VIB Vibrio mimicus 1 1 120 449. VIB Vibrio natriegens 1 1 121 450. VIB Vibrio nereis 2 1 121 451. VIB Vibrio parahaemolyticus 1 1 120 452. VIB Vibrio pelagius 1 1 120 453. VIB Vibrio proteolyticus 1 1 120 454. VIB Vibrio psychroerythus 1 1 119 455. VIB Vibrio sp. 5 4 478 456. VIT Vitreoscilla stercoraria 1 1 1490 457. VVU Vibrio vulnificus 1 1 120 458. WHT Triticum aestivum 16 13 1679 459. WHT Triticum sp. 2 2 152 460. WHT Triticum vulgare 2 2 282 461. WHTCP Chloroplast Triticum aestivum 1 1 96 462. WHTMT Mitochondrion Triticum aestivum 1 1 97 463. WHTMT Mitochondrion Triticum vulgare 1 1 122 464. WLB Wolbachia persica 2 1 1475 465. WOL Wolinella succinogenes 1 1 1503 466. XEB Xenopus borealis 3 3 477 467. XEL Xenopus laevis 19 16 4037 468. XET Xenopus tropicalis 2 2 242 469. XYL Xylella fastidiosa 1 1 1493 470. YSA Candida albicans 1 1 121 471. YSC Saccharomyces cerevisiae 55 45 4730 472. YSCMT Mitochondrion Saccharomyces cerevisiae 35 32 5657 473. YSG Saccharomyces carlsbergensis 4 2 242 474. YSK Kluyveromyces lactis 1 1 121 475. YSP Schizosaccharomyces pombe 8 8 630 476. YSR Pichia membranaefaciens 1 1 120 477. YST Yeast sp. 7 6 492 478. YSTMT Mitochondrion Yeast sp. 8 8 613 479. YSU Candida utilis 11 10 815 Total 1646 1374 357521 VIRAL Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. AA2 Adeno associated virus 9 6 7879 2. AAF Avian musculoaponeurotic fibrosarcoma virus 2 1 3171 3. AC2 Avian carcinoma virus 14 11 18641 4. ACB Avian erythroblastosis virus 19 15 18700 5. ACE Avian endogenous virus 5 5 2772 6. ACF Fujinami sarcoma virus 3 2 7503 7. ACM Avian myelocytomatosis retrovirus 7 5 11273 8. ACR Avian reticuloendotheliosis virus 12 8 8401 9. ACS Avian sarcoma virus 15 13 16357 10. AD4 Mastadenovirus h40 4 4 10795 11. AD4 Mastadenovirus h41 3 3 8920 12. ADA Mastadenovirus s30 2 2 406 13. ADB Mastadenovirus 2 81 4 36298 14. ADB Mastadenovirus c2 1 1 196 15. ADC Mastadenovirus h3 12 11 9026 16. ADD Mastadenovirus h4 7 5 5078 17. ADE Mastadenovirus h5 34 10 30096 18. ADG Mastadenovirus h7 15 6 13245 19. ADG Mastadenovirus s7 5 4 4781 20. ADI Mastadenovirus 9 2 2 332 21. ADJ Mastadenovirus 10 1 1 135 22. ADL Mastadenovirus 2 2 1 430 23. ADL Mastadenovirus h12 40 22 17527 24. ADR Mastadenovirus 18 2 2 364 25. ADT Tupaia adenovirus 4 4 3784 26. ADU Mastadenovirus 19 1 1 154 27. ADV Mastadenovirus 9 8 11370 28. ADX Mastadenovirus bos1 1 1 159 29. ADX Mastadenovirus mus 6 4 8379 30. ADY Eggdrop syndrome-1976 virus 1 1 52 31. ADZ Mastadenovirus 31 2 2 300 32. ADZ Mastadenovirus bos3 1 1 2849 33. ADZ Mastadenovirus c2 2 2 3689 34. AEA Avian adenovirus 3 3 576 35. AEC Canine adenovirus 4 4 805 36. AEE Equine adenovirus 5 5 905 37. AIN Aino virus 1 1 850 38. ALE Rous associated virus 7 6 4402 39. ALM Avian myeloblastosis virus 5 5 4537 40. ALR Rous sarcoma virus 157 132 60856 41. ALV Avian leukosis virus 12 12 5400 42. APH Foot and mouth disease virus 113 108 74852 43. ARE Avian retrovirus 2 2 698 44. ASB Avocado sunblotch viroid 20 19 4715 45. BBM Broad bean mottle virus 3 3 680 46. BBV Black beetle virus 3 2 4504 47. BCT Beet curly top virus 1 1 2993 48. BEV Bovine enterovirus 1 1 7414 49. BLC Bunyamwera virus 3 3 12294 50. BLC Bunyavirus La Crosse 18 17 8616 51. BLC Germiston bunyavirus 2 2 5514 52. BLV Bovine leukemia virus 14 14 23104 53. BNY Beet necrotic yellow vein mosaic virus 2 2 11358 54. BTV Bluetongue virus 29 21 41033 55. BVD Bovine viral diarrhea virus 1 1 12573 56. BWY Beet western yellow virus 5 3 7958 57. BYD Barley yellow dwarf virus 3 2 6280 58. CAD Canine distemper virus 3 3 5226 59. CAN Carnation etched ring virus 1 1 7932 60. CAP Capripoxvirus 4 4 10417 61. CAS Cassava latent virus 2 2 5503 62. CASNS Cas NS1 retrovirus 1 1 2711 63. CCC Cadang-cadang coconut viroid 3 3 779 64. CCP Cricket paralysis virus 1 1 1594 65. CEA Caprine arthritis encephalitis virus 4 3 1618 66. CEV Citrus exocortis viroid 3 3 1113 67. CHM Chloris striate mosaic virus 1 1 2750 68. CHV Chlorella virus 2 2 3727 69. CMV Carnation mottle virus 1 1 4003 70. CNV Cucumber necrosis virus 1 1 4701 71. CO4 Coliphage N4 2 2 1759 72. COB Bovine coronavirus 7 5 15527 73. CPE Euxoa scandens cytoplasmic polyhedrosis virus 1 1 882 74. CPF Cucumber pale fruit viroid 3 2 604 75. CPR Chandipura virus 1 1 1751 76. CPV Cowpox virus 19 19 13138 77. CSO Campoletis sonorensis virus 8 6 9418 78. CSV Chrysanthemum stunt viroid 4 3 1040 79. CTN Coconut tinangaja viroid 1 1 254 80. CXB Coxsackievirus B1 2 2 8844 81. CXB Coxsackievirus B3 4 4 13082 82. CXB Coxsackievirus B4 1 1 7395 83. CYM Clover yellow mosaic potexvirus 1 1 1051 84. CYS Lymphocystis disease virus of fish 3 3 5310 85. DEN Dengue virus 24 22 70563 86. DHV Dhori virus 1 1 1479 87. DMB Thymotropic retrovirus type B 1 1 285 88. DNV Densonucleosis virus 1 1 4277 89. DPF Dapple peach fruit disease viroid 1 1 297 90. DPP Dapple plum and peach fruit disease viroid 1 1 297 91. DUG Dugbe nairovirus 1 1 1712 92. EBO Ebola virus 1 1 3021 93. ECV Echo 11 virus 1 1 98 94. ECV Echo 6 virus 1 1 99 95. ECV Echo 9 virus 2 2 615 96. EEE Eastern equine encephalomyelitis virus 5 5 5163 97. EEV Venezuelan equine encephalitis virus 7 6 8300 98. EEW Western equine encephalitis virus 2 2 4521 99. EIA Equine infectious anemia virus 15 8 20826 100. EMC Encephalomyocarditis virus 8 7 23697 101. FCG Gardner-Arnstein Feline Leukemia oncovirus B 2 2 3863 102. FCL Feline calicivirus 1 1 3865 103. FCR RD114 retrovirus 1 1 126 104. FCS Feline sarcoma virus 7 7 14248 105. FCV Feline leukemia virus 17 15 38510 106. FIP Feline infectious peritonitis virus 1 1 4500 107. FIV Feline immunodeficiency virus 2 2 9848 108. FLA Influenza virus type A 448 380 388880 109. FLB Influenza virus type B 81 68 98549 110. FLC Influenza virus type C 28 28 45497 111. FMV Figwort mosaic virus 1 1 7743 112. FPV Fowlpox virus 5 5 22700 113. FV3 Frog virus 3 3 2 2273 114. GPB Granulosis virus 1 1 999 115. GPR Gottfried porcine rotavirus 1 1 3302 116. GSB GS virus 1 1 307 117. GSH Ground squirrel hepatitis virus 1 1 3311 118. GVI Grapevine viroid 2 1 369 119. GVT Trichoplusia ni granulosis virus 1 1 998 120. GYS Grapevine yellow speckle viroid 3 2 730 121. HAN Hantaan virus 4 3 8928 122. HBD Duck hepatitis B virus 6 5 12249 123. HBH Heron hepatitis B virus 1 1 3027 124. HCV Hog cholera virus 2 1 12284 125. HIV Human immunodeficiency virus type 1 86 38 163371 126. HIV Human immunodeficiency virus type 2 5 4 29006 127. HIV Human lymphotropic virus type III 1 1 261 128. HIV Human T-cell lymphotropic virus type II 7 3 10520 129. HJV Highlands J virus 2 2 505 130. HL1 Human lymphotropic virus type I 14 12 23190 131. HL2 Human lymphotropic virus type II 4 4 5400 132. HLV Hop latent viroid 2 1 256 133. HOB Human coronavirus 2 2 2560 134. HOJ HoJo virus 1 1 3613 135. HOM Mus hortulanus virus 3 3 4668 136. HOP Hop Stunt Viroid 8 6 1795 137. HPA Hepatitis A virus 13 11 42881 138. HPB Hepatitis B virus 65 61 56997 139. HPD Hepatitis delta virus 4 3 3523 140. HPE Hepatitis E virus 2 1 2570 141. HPU Duck hepatitis virus 2 1 3021 142. HPV Hepatitis virus 1 1 3182 143. HRD Human retrovirus type D 1 1 8785 144. HRV Human rhinovirus 11 9 31251 145. HS1 Herpes simplex virus type 1 144 105 311258 146. HS2 Herpes simplex virus type 2 34 28 51383 147. HS4 Epstein-Barr virus 83 63 287384 148. HS5 Human cytomegalovirus 43 39 134697 149. HS5 Murine cytomegalovirus 3 2 4331 150. HS5 Simian cytomegalovirus 1 1 880 151. HS6 Human herpesvirus type 6 1 1 21858 152. HSB Bovine herpesvirus type 1 9 9 8227 153. HSC Simian cytomegalovirus 2 2 2294 154. HSE Equine herpesvirus type 1 10 8 25897 155. HSG Gallid herpesvirus type 1 5 5 10607 156. HSK Gallid herpesvirus type 2 4 4 11325 157. HSL Feline herpesvirus 1 1 1619 158. HSL Herpesvirus ateles 1 1 2577 159. HSM Gallid herpesvirus type 1 3 3 3648 160. HSO Herpesvirus papio 1 1 695 161. HSP Human spumaretrovirus 3 3 12095 162. HSS Herpesvirus saimiri 29 28 20488 163. HSS Pseudorabies virus 14 11 20384 164. HST Herpesvirus tamarinus 2 1 2556 165. HSU Herpesvirus tupaia 1 1 863 166. HSV Herpes simplex virus 1 1 501 167. HSY Herpesvirus sylvilagus 1 1 559 168. HTV Human adult T-cell leukemia virus 1 1 2266 169. IBA Avian infectious bronchitis virus 24 24 31163 170. IBB Infectious bronchitis virus 1 1 3645 171. IBD Infectious bursal disease virus of chickens 2 2 5924 172. IHN Infectious hematopoietic necrosis virus 2 2 2961 173. IPN Infectious pancreatic necrosis virus 2 1 3097 174. IRI Iridescent virus type 6 3 3 10327 175. JEV Japanese encephalitis virus 4 4 18496 176. KUN Kunjin virus 1 1 10664 177. KVS Killer virus of S.cerevisiae 7 5 1260 178. LCV Lymphocytic choriomeningitis virus 11 11 19716 179. LDV Lactate dehydrogenase-elevating virus 5 5 620 180. LEE Lee virus 1 1 3616 181. LSV Lassa virus 3 3 6890 182. MAA Alfalfa mosaic virus 22 13 14703 183. MAV Myeloblastosis-associated virus 1 1 1173 184. MBG Bean golden mosaic virus 4 4 10465 185. MBG Bean yellow mosaic virus 1 1 1015 186. MBR Brome mosaic virus 16 12 9903 187. MBS Barley stripe mosaic virus 11 8 13655 188. MBV Middleburg virus 4 3 3394 189. MCA Cauliflower mosaic virus 14 13 31503 190. MCC Cowpea chlorotic mottle virus 7 6 6379 191. MCF Mink cell focus-forming virus 8 8 8202 192. MCG Cucumber green mottle mosaic virus 2 2 2421 193. MCP Cowpea mosaic virus 6 6 10010 194. MCV Cucumber mosaic virus 53 52 34142 195. MEA Measles virus 30 25 83843 196. MEV Maus-Elberfeld virus 1 1 54 197. MGR Maguari bunyavirus 1 1 945 198. MHV MHV mouse hepatitis virus 14 9 10970 199. MHV Murine hepatitis virus 14 13 12714 200. MHV Murine hepatitis virus A59 1 1 751 201. MLA Abelson murine leukemia virus 8 6 10848 202. MLE Mouse RFV endogenous retrovirus 2 2 684 203. MLF Friend mink cell focus-inducing virus 5 4 7000 204. MLF Friend murine leukemia virus 2 2 4170 205. MLF Friend spleen focus-forming virus 9 9 13488 206. MLG Gross passage A murine leukemia virus 2 2 1220 207. MLK Kirsten murine leukemia virus 1 1 1335 208. MLM Moloney murine leukemia virus 54 40 32500 209. MLN Murine non-leukeminogenic retrovirus 1 1 529 210. MLO AKV murine leukemia virus 7 2 9000 211. MLR Rauscher spleen focus-forming virus 2 2 2244 212. MLS Soule murine leukemia virus 2 2 1310 213. MLT Tikaut murine leukemia virus 1 1 641 214. MLV Murine leukemia virus 44 36 41137 215. MLX Xenotropic murine leukemia virus 1 1 3060 216. MMT Mouse mammary tumor virus 30 30 36715 217. MNC Narcissus mosaic potexvirus 1 1 6955 218. MOK Mokola lyssavirus 2 2 152 219. MPV Monkeypox virus 1 1 1276 220. MSB Southern bean mosaic virus 2 2 793 221. MSC Sugarcane mosaic virus 1 1 1782 222. MSH Harvey murine sarcoma virus 3 3 3036 223. MSJ FBJ murine osteosarcoma virus 1 1 4226 224. MSK Kirsten murine sarcoma virus 2 2 1933 225. MSN Solanum nodiflorum mottle virus 1 1 377 226. MSR FBR murine osteosarcoma virus 1 1 3811 227. MSV Murine sarcoma virus 5 5 5020 228. MSY Myeloproliferative sarcoma virus 3 3 5305 229. MTG Tomato golden mosaic virus 3 3 6342 230. MTR Tobacco rattle virus 7 7 20386 231. MTS Lucerne transient streak virus 3 3 970 232. MTV Tobacco mosaic virus 21 7 16116 233. MTV Velvet tobacco mottle virus 1 1 366 234. MTY Andean potato latent virus 1 1 96 235. MTY Clitoria yellow vein virus 1 1 120 236. MTY Eggplant mosaic virus 2 2 138 237. MTY Kennedya yellow mosaic virus 1 1 83 238. MTY Ononis yellow mosaic virus 1 1 131 239. MTY Turnip yellow mosaic virus 18 14 9480 240. MUM Mumps virus 9 7 11681 241. MVE Murray Valley encephalitis virus 1 1 5436 242. MVM Minute virus of mice 9 7 16222 243. MYX Myxoma virus 1 1 1421 244. MZS Maize streak virus 5 4 8139 245. NDV Newcastle disease virus 48 46 96524 246. NEV Nephropathia epidemica 2 2 5466 247. NPA Autographa californica nuclear polyhedrosis virus 40 38 59258 248. NPB Bombyx mori nuclear polyhedrosis virus 3 3 3931 249. NPG Galleria mellonella nuclear polyhedrosis virus 5 5 2556 250. NPM Mamestra brassicae nuclear polyhedrosis virus 1 1 2598 251. NPO Orgyia pseudotsugata polyhedrosis virus 7 7 13383 252. NPS Spodoptera frugiperda nuclear polyhedrosis virus 1 1 1557 253. OLV Ovine lentivirus 1 1 9256 254. ONN O'Nyong-nyong virus 1 1 3543 255. ORF Orf virus 2 2 1320 256. PCB Baboon endogenous virus 8 7 20105 257. PCC Colobus type C cpc-1 endogenous retrovirus 2 2 373 258. PCE Chimpanzee type C endogenous retrovirus 2 2 430 259. PCG Gibbon leukemia virus 5 4 9202 260. PCM Macaca endogenous retrovirus 1 1 126 261. PCM Macaca mulatta type C retrovirus 4 4 938 262. PCS Simian sarcoma virus 12 9 10868 263. PEB Pea early browning virus 2 1 7073 264. PEV Subacute sclerosing panencephalitis virus 3 3 3444 265. PIB Bovine parainfluenza virus type 3 3 1 8700 266. PIC Pichinde Arenavirus 8 8 12637 267. PIF Human parainfluenza virus type 3 29 27 46139 268. PLV Potato leaf roll virus 5 3 6650 269. PLY Budgerigar fledgling disease virus 1 1 4980 270. PLY Polyomavirus 130 38 35316 271. PMP Papaya mosaic potexvirus 1 1 900 272. PMS Simian paramyxovirus (SV5) 1 1 1382 273. PMV Pepper mottle virus 1 1 1480 274. POL Poliovirus 116 103 77346 275. POV Porcine parvovirus 1 1 3670 276. PPA Avian papillomavirus 2 2 786 277. PPB Bovine papillomavirus 12 12 32403 278. PPC Hamster papovavirus 1 1 5366 279. PPD Deer papillomavirus 1 1 8374 280. PPE European Elk papillomavirus 4 3 8842 281. PPH Human papillomavirus 36 35 92944 282. PPI Micromys minutus papillomavirus 3 3 487 283. PPL Lymphotropic papovavirus 1 1 5270 284. PPM Monkey B-lymphotropic papovavirus 4 4 10920 285. PPR Reindeer papillomavirus 2 2 930 286. PPV Plum pox potyvirus 3 3 13827 287. PRV Porcine rotavirus 8 7 9167 288. PSV Peanut stunt virus 1 1 393 289. PTP Punta toro phlebovirus 6 6 7130 290. PTV Potato spindle tuber viroid 3 3 1077 291. PV1 Parvovirus H1 3 2 5302 292. PV3 Parvovirus H3 1 1 125 293. PVA Raccoon parvovirus 2 1 2410 294. PVB Papovavirus BKV 34 21 17269 295. PVB Parvovirus B19 4 4 11325 296. PVC Canine parvovirus 4 4 10016 297. PVD Bovine parvovirus 2 1 5517 298. PVF Feline panleukopenia virus 10 6 16703 299. PVM Mink enteritis virus 4 2 4888 300. PVR Kilham rat virus 1 1 125 301. PVR Parvovirus R1 2 2 548 302. PVS Potato virus S 1 1 3552 303. PVX Potato virus X 4 3 8841 304. PVY Potato virus Y 4 3 12008 305. RAV Rabies virus 20 19 30687 306. RBF Malignant rabbit fibroma virus 3 3 446 307. RBF Rabbit fibroma virus 16 15 28212 308. RCM Red clover mottle virus 1 1 3543 309. RDV Rice dwarf virus 3 2 3889 310. REO Reovirus type 1 20 18 12885 311. REO Reovirus type 2 13 11 5383 312. REO Reovirus type 3 46 32 20272 313. RML Rauscher murine leukemia virus 1 1 139 314. RNM Red clover necrotic mosaic virus 3 2 5338 315. RO1 Rotavirus sp. 3 3 4074 316. RO1 Rotavirus subgroup 1 3 2 2712 317. RO2 Rotavirus subgroup 2 6 6 5955 318. ROB Bovine rotavirus 20 15 23547 319. ROH Human rotavirus 3 3 4480 320. ROR Rhesus rotavirus 2 2 3424 321. ROT Simian (SA11) rotavirus 14 12 13186 322. RPF Rinderpest virus 4 4 8866 323. RRV Ross river virus 4 3 19686 324. RSH Human respiratory syncytial virus 33 21 19332 325. RSV Rat sarcoma virus 1 1 1380 326. RUB Rubella virus 11 7 20872 327. RVF Rift Valley fever virus 1 1 3884 328. SAM Satellite arabis mosaic virus 1 1 300 329. SAP Satellite panicum mosaic virus 1 1 826 330. SFS Sandfly fever Sicilian virus 1 1 1746 331. SFV Semliki forest virus 12 3 15380 332. SHV Simian hepatitis A virus 4 2 1404 333. SIG Sigma virus 1 1 1718 334. SIN Sindbis virus 15 5 15504 335. SIV Simian immunodeficiency virus 17 10 49776 336. SIV Simian immunodeficiency virus 1 1 7759 337. SIV Simian immunodeficiency virus 3 3 9130 338. SLO St. Louis encephalitis virus 8 8 5391 339. SND Parainfluenza virus 37 29 42275 340. SNV Spleen necrosis virus 9 8 3909 341. SPV Spiroplasma virus 1 1 4421 342. SSH Snowshoe hare bunyavirus 11 10 6726 343. STL Simian T-cell lymphotropic virus type I 1 1 4122 344. STT St. Thomas 3 rotavirus 1 1 1062 345. SV4 Rhesus macaque polyomavirus 177 41 14992 346. SV5 Simian virus 5 4 4 5586 347. SVC Spring viremia of carp virus 1 1 710 348. SVD Swine vesicular disease virus 2 2 7475 349. SYE Sonchus yellow net virus 2 2 1751 350. TAC Tacaribe virus 2 2 3505 351. TAS Tomato apical stunt viroid 3 2 723 352. TBE Tick-borne encephalitis virus 4 2 21023 353. TBR Tomato black ring virus 11 11 18222 354. TBS Tomato bushy stunt virus 3 2 5172 355. TCV Turnip crinkle virus 2 2 5500 356. TEV Tobacco etch virus 4 3 21315 357. TGE Transmissible gastroenteritis virus 12 8 19556 358. TME Theiler's murine encephalomyelitis virus 4 4 26220 359. TNS Satellite tobacco necrosis virus 4 2 1380 360. TOA Tomato aspermy virus 5 5 943 361. TOS Tomato ringspot virus 2 2 3096 362. TPM Tomato plant macho viroid 1 1 360 363. TRS Tobacco ringspot virus 3 3 790 364. TSV Tobacco streak virus 3 3 2525 365. TVM Tobacco vein mottling virus 3 2 9892 366. UST Ustilago maydis P6 virus 1 1 1234 367. UUK Uukuniemi virus 1 1 3231 368. VAC Vaccinia virus 75 69 137948 369. VAR Variola virus 1 1 1274 370. VAZ Varicella-zoster virus 10 6 130009 371. VLV Visna virus 3 2 9690 372. VSV Vesicular stomatitis virus 142 120 147854 373. VYS Saccharomyces cerevisiae virus ScV1 1 1 819 374. WCP White clover mosaic virus 2 2 7457 375. WDV Wheat dwarf virus 2 2 2829 376. WHV Woodchuck hepatitis virus 6 6 15916 377. WNF West Nile virus 6 3 11194 378. WTV Wound tumor virus 7 7 11555 379. YFV Flavivirus febricis 3 2 11427 Total 4175 3216 5363016 PHAGE Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. AL3 Bacteriophage alpha3 5 4 1209 2. BAZ Bacteriophage Z 1 1 370 3. BEO Corynebacteriophage omega 1 1 1880 4. BET Corynebacteriophage beta 3 2 4162 5. BEU Corynebacteriophage gamma 2 2 139 6. BNF Bacteriophage NF 6 5 3258 7. BO1 Bacteriophage Bo1 1 1 205 8. BP2 Bacteriophage P21 2 2 191 9. BPH Bacteriophage phi-11 1 1 300 10. BU3 Bacteriophage U3 1 1 201 11. BZ3 Bacteriophage Bz13 1 1 218 12. C31 Bacteriophage phi-c31 1 1 3413 13. CF1 Bacteriophage Cf16 1 1 500 14. CP1 Bacteriophage Cp-1 5 3 3364 15. CP5 Bacteriophage Cp-5 4 2 1850 16. CP7 Bacteriophage Cp-7 4 2 3322 17. CPT Bacteriophage Cp-T1 1 1 730 18. D18 Bacteriophage D108 9 7 3935 19. F1C Bacteriophage f1 14 11 16119 20. FR1 Bacteriophage FR 2 2 1993 21. G14 Bacteriophage G14 1 1 113 22. H19B Bacteriophage H19B 2 2 3301 23. H30 Bacteriophage H30 1 1 1905 24. H44 Bacteriophage H4489A 1 1 3222 25. HP1 Bacteriophage HP1 4 2 10673 26. IKE Bacteriophage Ike 3 2 7200 27. J93 Bacteriophage 933J 2 1 1499 28. JP3 Bacteriophage Jp34 2 2 1070 29. JP5 Bacteriophage Jp501 1 1 205 30. KU1 Bacteriophage Ku1 1 1 220 31. L17 Bacteriophage L17 2 2 240 32. L54 Bacteriophage L54a 1 1 1626 33. LAM Bacteriophage lambda 116 20 53621 34. LP7 Bacteriophage LP7 1 1 2110 35. M13 Bacteriophage M13 11 7 8039 36. M13MP7 Bacteriophage M13mp7 1 1 60 37. M13MP8 Bacteriophage M13mp8 3 3 240 38. M13MP9 Bacteriophage M13mp9 2 2 318 39. M2Y Bacteriophage M2Y 2 2 336 40. MS2 Bacteriophage MS2 13 5 4283 41. OX2 Bacteriophage Ox2 2 2 2641 42. P15 Bacteriophage phi-105 1 1 1306 43. P16 Bacteriophage 16-3 1 1 720 44. P18 Bacteriophage 186 1 1 3561 45. P21 Bacteriophage phi-21 2 2 949 46. P22 Bacteriophage P22 17 15 18461 47. P29 Bacteriophage phi-29 17 14 17628 48. P42 Bacteriophage 42D 1 1 993 49. P434 Bacteriophage 434 7 5 2933 50. P80 Bacteriophage phi-80 7 6 4714 51. P82 Bacteriophage 82 1 1 1200 52. P93 Bacteriophage 933W 2 1 1661 53. PA2 Bacteriophage PA-2 1 1 2816 54. PF1D Bacteriophage Pf1 1 1 435 55. PF3 Bacteriophage Pf3 4 4 12981 56. PFD Bacteriophage fd 11 7 7334 57. PFI Bacteriophage Fi 1 1 78 58. PG4 Bacteriophage G4 12 8 7247 59. PGA Bacteriophage Ga 4 4 4022 60. PH1 Bacteriophage H1 1 1 98 61. PH15 Bacteriophage phi-15 3 3 2352 62. PH2 Bacteriophage 21 1 1 1688 63. PH3 Bacteriophage phi-3T 2 2 3422 64. PH5 Bacteriophage phi-105 6 5 3851 65. PH6 Bacteriophage phi-6 7 7 13619 66. PHI Bacteriophage phi-H 1 1 2465 67. PHK Bacteriophage phi-K 1 1 336 68. PK3 Bacteriophage K3 7 6 6630 69. PM1 Bacteriophage M1 1 1 1714 70. PMU Bacteriophage Mu 43 32 17325 71. PP1 Bacteriophage P1 37 36 16583 72. PP2 Bacteriophage P2 10 9 7348 73. PP4 Bacteriophage P4 7 7 13339 74. PP7 Bacteriophage P7 3 3 1641 75. PQB Bacteriophage Q-beta 11 10 1710 76. PR4 Bacteriophage PR4 2 2 240 77. PR5 Bacteriophage PR5 2 2 238 78. PR722 Bacteriophage PR722 2 2 240 79. PRD1 Bacteriophage PRD1 8 8 7360 80. PS2 Bacteriophage PBS2 1 1 720 81. PSP Bacteriophage Sp 3 2 4542 82. PST Bacteriophage ST 1 1 246 83. PT2 Bacteriophage T2 9 6 8743 84. PT3 Bacteriophage T3 19 16 14888 85. PT4 Bacteriophage T4 103 57 109845 86. PT5 Bacteriophage T5 28 25 26730 87. PT6 Bacteriophage T6 2 1 946 88. PT7 Bacteriophage T7 31 8 41756 89. PVK Bacteriophage VK 1 1 246 90. PX1 Bacteriophage phi-X174 38 14 7187 91. PZA Bacteriophage PZA 3 1 19366 92. R17 Bacteriophage R17 8 6 402 93. RHO Bacteriophage Rho11s 2 1 2187 94. S13 Bacteriophage S13 1 1 5386 95. SP1 Bacteriophage SPO1 19 19 5707 96. SP2 Bacteriophage SPO2 1 1 3040 97. SP6 Bacteriophage SP6 5 4 2948 98. SP8 Bacteriophage SP82 7 7 1791 99. SPB Bacteriophage SP-beta 4 3 2224 100. SPC Bacteriophage S-phi-C 1 1 1377 101. SPP Bacteriophage SPP1 2 2 1558 102. SPR Bacteriophage SPR 3 1 2129 103. ST1 Bacteriophage ST-1 2 2 844 104. T12 Bacteriophage T12 1 1 1837 105. TH1 Bacteriophage TH1 1 1 220 106. TW1 Bacteriophage TW19 1 1 76 107. TW2 Bacteriophage TW28 1 1 260 Total 781 509 608720 SYNTHETIC Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. AD2 Artificial gene 1 1 128 2. ADB Artificial gene 8 8 573 3. ADH Artificial gene 1 1 106 4. ADL Artificial gene 3 3 273 5. ADV Artificial gene 1 1 106 6. ALM Avian myeloblastosis virus 1 1 337 7. ALR Rous sarcoma virus 4 4 413 8. AMH Artificial gene 1 1 234 9. ARB Artificial gene 3 3 1180 10. ARC Cloning vector 2 2 760 11. ARE Artificial gene 1 1 255 12. ARG Artificial gene 1 1 249 13. ARH Artificial gene 5 5 440 14. ARI Artificial gene 1 1 465 15. ARL Artificial gene 2 2 424 16. ARM Artificial gene 1 1 457 17. ARN Cloning vector 1 1 333 18. ARP Cloning vector 6 6 1079 19. ARS Artificial gene 1 1 529 20. ART Artificial gene 1 1 60 21. ARY Artificial gene 1 1 264 22. ATH Artificial gene 1 1 417 23. BKV BK Virus 6 3 1560 24. BOV Bos taurus 17 17 6114 25. BSF Cloning vector 2 2 54 26. BSM Cloning vector 1 1 54 27. BSU Bacillus subtilis 9 8 3609 28. BTH Artificial gene 2 2 104 29. CAR Artificial gene 1 1 3616 30. CEL Caenorhabditis elegans 1 1 186 31. CHK Gallus sp. 4 4 701 32. CHS Artificial gene 1 1 478 33. CMVMUS Artificial gene 1 1 1376 34. COT Artificial gene 1 1 7876 35. CRO Artificial gene 2 2 198 36. CVC Cloning vector 1 1 46 37. CVE Cloning vector 1 1 60 38. CVJ Cloning vector 5 5 390 39. CVK Cloning vector 1 1 120 40. DRO Drosophila sp. 2 2 322 41. ECO Escherichia coli 111 99 17619 42. EGF Artificial gene 1 1 299 43. ERY Artificial gene 1 1 217 44. EXP Cloning vector 2 2 123 45. EZZ Cloning vector 1 1 60 46. FCS Cloning vector 2 2 136 47. FLA Artificial gene 1 1 69 48. FLU Influenza virus 6 6 861 49. FSB Artificial gene 2 2 747 50. GFA Artificial gene 1 1 176 51. HAL Artificial gene 4 3 1633 52. HBV Hepatitis B virus 3 3 315 53. HCY Artificial gene 1 1 313 54. HET Hetropolymeric DNA 2 2 594 55. HIR Artificial gene 1 1 220 56. HIV Artificial gene 5 5 297 57. HL1 Artificial gene 4 4 238 58. HNR Artificial gene 1 1 90 59. HPB Artificial gene 1 1 556 60. HS1 Artificial gene 1 1 780 61. HS2 Artificial gene 2 2 129 62. HS5 Human cytomegalovirus 1 1 210 63. HSV Herpes Simplex Virus 6 6 323 64. HUM Artificial human gene 73 64 19468 65. HY3 Plasmid pHY300PLK 1 1 4870 66. IFH Cloning vector 1 1 63 67. IL1 Artificial gene 1 1 88 68. INS Artificial gene 1 1 232 69. ISN Insertion element 6 6 378 70. JRD Cloning vector 3 3 6852 71. KAN Cloning vector 3 3 210 72. KPN Klebsiella pneumoniae 2 2 354 73. KY1 Artificial gene 1 1 171 74. LAC Cloning vector 2 2 1173 75. LAM Bacteriophgage lambda 4 4 336 76. LET Artificial gene 1 1 212 77. LGT Cloning vector lambda gt11 1 1 210 78. LHM Artificial gene 1 1 232 79. LOR Cloning vector 1 1 5614 80. M13 Cloning vector M13 8 8 643 81. M13MP7 Cloning vector M13mp7 2 2 120 82. M13MP8 Cloning vector M13mp8 1 1 382 83. M13MP9 Cloning vector M13mp9 1 1 60 84. M13TG103 Cloning vector M13tg103 1 1 66 85. M13TG114 Cloning vector M13tg114 1 1 60 86. M13TG115 Cloning vector M13tg115 1 1 66 87. M13TG117 Cloning vector M13tg117 1 1 63 88. M13TG120 Cloning vector M13tg120 1 1 54 89. M13TG130 Cloning vector M13tg130 1 1 93 90. M13TG131 Cloning vector M13tg131 1 1 93 91. MBO Artificial gene 1 1 91 92. MBR Artificial gene 3 3 157 93. MCA Cauliflower mosaic virus 2 2 139 94. MCV Cucumber mosaic virus 5 5 284 95. MHI Mouse-human hybrid 4 4 1574 96. MLE Artificial gene 1 1 936 97. MLF Artificial gene 1 1 213 98. MLM Artificial gene 2 2 178 99. MML Cloning vector 12 4 24042 100. MNV Artificial gene 1 1 87 101. MP7 Artificial gene 2 1 69 102. MP8 Artificial gene 2 1 60 103. MP9 Artificial gene 2 1 60 104. MS2 Artificial gene 1 1 100 105. MSM Artificial gene 2 2 331 106. MUS Mus musculus 37 37 4018 107. NEU Artificial gene 2 2 171 108. NNL Plasmid pNNL 1 1 815 109. NPA Autographa californica nuclear polyhedrosis virus 3 3 922 110. P17X Plasmid pACYC177 7 6 4190 111. P18X Plasmid pACYC184 5 4 4593 112. P23 Artificial gene 1 1 119 113. PAC Cloning vector 1 1 83 114. PAH Artificial gene 2 2 107 115. PBD Cloning vector 1 1 79 116. PBG Cloning vector 6 3 12379 117. PBR Plasmid pBR322 42 23 7125 118. PBR313 Plasmid pBR313 1 1 200 119. PBR322SV Plasmid pBR322/SV40 hybrid 5 5 209 120. PBR325 Plasmid pBR325 3 3 319 121. PBR327 Plasmid pBR327 3 3 3334 122. PBR329 Plasmid pBR329 1 1 4150 123. PBR345 Plasmid pBR345 2 2 1024 124. PBRH4 Plasmid pBRH4 1 1 71 125. PCE Cloning vector 1 1 510 126. PCG86 Plasmid pCG86 2 2 654 127. PCZ Plasmid pCZ 2 2 208 128. PDPL Plasmid PDPL13 1 1 79 129. PEM Cloning vector pEMBL8p 4 2 7878 130. PES Cloning vector 1 1 99 131. PF1 Bacteriophage f1 1 1 254 132. PFE Plasmid pFE 2 2 180 133. PFH Plasmid pFH 1 1 120 134. PFL Cloning vector 2 1 4588 135. PFR Plasmid pFR 4 4 341 136. PHP Plasmid pHP45 1 1 155 137. PHS Plamsid pHS 3 3 2877 138. PHV100 Cloning vector 1 1 396 139. PHV33 Artificial gene 13 13 650 140. PIC Plasmid pIC 5 5 477 141. PIG Artificial pig gene 3 3 440 142. PIP1088 Plasmid pIP1088 2 2 142 143. PIVX Cloning vector pi-VX 1 1 902 144. PJSC73 Plasmid pJSC73 1 1 3564 145. PK18 Cloning vector 1 1 2661 146. PKN Plasmodium knowlesi 2 1 360 147. PKT Artificial gene 3 3 264 148. PKU Cloning vector 3 2 7825 149. PL2 Artificial gene 2 2 240 150. PL5 Artificial gene 2 2 310 151. PLB Cloning vector 1 1 852 152. PLF Cloning vector 2 1 3641 153. PLY Artificial gene 1 1 66 154. PMB9 Cloning vector 1 1 138 155. PMC1843 Plasmid pMC1843 1 1 62 156. PMK20 Artificial gene 2 1 4028 157. PMT Cloning vector 1 1 2854 158. PMU Artificial gene 4 4 576 159. POG Cloning vector 1 1 352 160. POL Artificial gene 2 2 129 161. POLY Cloning vector 6 3 6226 162. PORI17 Plasmid pOri17 2 2 490 163. PPI Cloning vector 1 1 4734 164. PPUC Cloning vector 1 1 75 165. PQB Artificial gene 1 1 64 166. PRK Cloning vector 2 2 839 167. PRT Artificial gene 1 1 711 168. PSE Artificial gene 2 2 139 169. PSI Cloning vector 1 1 81 170. PSKS104 Plasmid pSKS104 1 1 69 171. PSKS105 Plasmid pSKS105 1 1 60 172. PSKS106 Plasmid pSKS106 1 1 60 173. PSKS107 Plasmid pSKS107 1 1 46 174. PSMF Cloning vector 2 2 259 175. PSP Cloning vector 2 2 119 176. PSR Artificial gene 1 1 138 177. PSS Cloning vector 2 2 475 178. PT4 Bacteriophage T4 4 4 725 179. PT7 Bacteriophage T7 2 2 282 180. PTK Plasmid pTK 1 1 68 181. PTL Cloning vector 1 1 51 182. PTN Plasmid pTN 1 1 355 183. PTR Plasmid pTr 1 1 137 184. PTZ Plasmid pTZ12 1 1 2517 185. PUC Cloning vector 2 1 3914 186. PUEX Cloning vector 2 1 6728 187. PVH51 Plasmid pVH51 1 1 3847 188. PX1 Bacteriophage phi-X174 1 1 59 189. PYM Artificial gene 1 1 252 190. PYR Artificial gene 1 1 158 191. PZ189 Cloning vector 1 1 153 192. R38 Plasmid R388 1 1 1167 193. R67 Plasmid R67 1 1 353 194. R6K Cloning vector 2 2 176 195. RAT Rattus sp. 14 13 1864 196. RET Cloning vector 2 2 780 197. RMT Artificial gene 6 6 638 198. RNA Artificial gene 1 1 328 199. ROT Artificial gene 2 2 141 200. RRNA Artificial gene 1 1 136 201. RSC1 Plasmid Rsc13 3 1 7894 202. RSF1050 Plasmid RSF1050 1 1 104 203. RSP Artificial gene 1 1 100 204. RSV Rous Sarcoma Virus 3 3 450 205. RTS Artificial gene 1 1 280 206. S100 Artificial gene 1 1 283 207. SAA Bacteriophage sigma-11-AA248 1 1 83 208. SAU Staphylcoccus aureus 1 1 60 209. SFV Semliki forest virus 3 3 171 210. SHI Cloning vector 3 3 428 211. SHU Artificial gene 3 3 272 212. SIN Cloning vector 2 2 596 213. SLM Artificial gene 10 10 817 214. SOMINS Artificial gene 1 1 226 215. SP02 Bacteriophage SP02 1 1 487 216. SP6 Artificial gene 1 1 78 217. SPI Artificial gene 2 2 295 218. SRU Artificial gene 1 1 252 219. STA Artificial gene 4 4 619 220. STM Artificial gene 1 1 71 221. STY Salmonella sp. 1 1 135 222. SV4 Simian Virus 40 13 13 2415 223. SVA Artificial gene 1 1 213 224. SYN Synthetic sequence 150 144 65005 225. T13 Artificial gene 1 1 223 226. T4L Artificial gene 1 1 518 227. TAC Artificial gene 1 1 842 228. THA Artificial gene 1 1 641 229. THY Plasmid pUC8 2 1 503 230. TI Plasmid Ti 2 2 127 231. TN3 Artificial gene 1 1 110 232. TNP Artificial gene 1 1 192 233. TNS Cloning vector 2 2 144 234. TOB Artificial gene 1 1 788 235. TRN28 Cloning vector 2 2 284 236. TRN3 Transposon Tn3 10 10 1119 237. TRN5 Artificial gene 1 1 80 238. TRNB Artificial gene 1 1 84 239. TU4 Cloning vector 2 2 350 240. VAC Cloning vector 4 4 683 241. VCH Artificial gene 1 1 444 242. VTR Cloning vector 1 1 148 243. WHL Artificial gene 1 1 507 244. XEL Xenopus laevis 13 13 1227 245. YSC Saccharomyces cerevisiae 39 39 9984 246. YSE Artificial gene 2 1 1795 247. ZMO Artificial gene 3 3 740 Total 1011 927 367496 UNANNOTATED Key Name Reports Entries Bases ------------------------------------------------------------------------------- 1. Unidentified 4438 3745 4028885 Total 4438 3745 4028885