MaizeSequence.org FTP Site ========================== This site provides access to the latest sequenced maize data. The site is part of the NSF-funded Maize Genome Sequencing Project. The directories are named using the 'YYYYMMDD' date format, and their content includes sequence data as of the indicated date. The 'current' directory points to the most current data directory. +----------------------------------------------+ | Release 3a.50 | +--------------------------+-------------------+ | Freeze Date | October 31, 2008 | | Maize BAC clones | 16,587 | | Maize BAC contigs | 185,231 | | Contigs per BAC | 11.2 | | Sequence Length | 2,778,853,373 bp | +--------------------------+-------------------+ | Evidence-based genes | 90,829 | | Fgenesh models | 514,467 | | Protein-coding genes | 59,052 | | Hypothetical genes | 86,201 | | Transposon-like genes | 369,214 | +--------------------------+-------------------+ | Working Gene Set* | 113,672 | | Evidence-based genes | 90,829 | | Fgenesh models | 22,843 | +--------------------------+-------------------+ * - The Working Gene Set: A set of genes has been defined as the entire set of evidence-based genes (predicted by Gramene GeneBuilder) that is then complemented by a set of Fgenesh models, predicted on masked DNA sequence, that does not overlap with the loci of the evidence-based genes. Files ----- Each directory contains the following files: SEQUENCES --------- bacs.fasta - Raw maize genome sequences for accessioned BACs, as stored in GenBank bacs_rm.fasta - RepeatMasked maize genome sequences for accessioned BACs bac_contigs.fasta - Sequences for all individual contigs that make up the accessioned BACs bac_contigs_rm.fasta - RepeatMasked sequences for individual BAC-contigs REPORTS ------- fpc_report.txt - A table describing which clones on the agarose FPC map have been accessioned (sequence present) evidence_genes_fpc_mappings.txt - A table showing the FPC location/annotation of BACs on which evidence-based genes were called protein_coding_fpc_mappings.txt - A table showing the FPC location/annotation of BACs on which protein-coding genes were called GENE MODELS ----------- protein_coding_genes.fasta - Nucleotide sequences of genes that are classified as having similarity to known proteins protein_coding_transcripts.fasta - Transcripts of genes that are classified as having similarity to known proteins protein_coding_translations.fasta - Protein translations of genes that are classified as having similarity to known proteins hypothetical_genes.fasta - Nucleotide sequences of genes that do not have similarity to any known protein hypothetical_transcripts.fasta - Transcripts of genes that do not have similarity to any known protein hypothetical_translations.fasta - Protein translations of genes that do not have similarity to any known protein te_like_genes.fasta - Nucleotide sequences of genes that are classified as transposon-like te_like_transcripts.fasta - Transcripts of genes that are classified as transposon-like te_like_translations.fasta - Protein translations of genes that are classified as transposon-like evidence_genes.fasta - Genes predicted through the Gramene GeneBuilder, based on supporting evidence evidence_transcripts.fasta - Transcripts of evidence-based genes evidence_translations.fasta - Protein translations of evidence-based genes working_set_genes.fasta - Working Gene Set (* see above) working_set_transcripts.fasta - Transcripts of the Working Gene Set working_set_translations.fasta - Protein translations of the Working Gene Set OTHER FILES ----------- compressed/ - Contains all the FASTA dumps in BZ-archive format gff/ - GFF3 dumps of BAC features, as Bzip2 archives gene_models.gff.bz2 - Combined ab initio and evidence-based genes, with underlying gene structure working_gene_set.gff.bz2 - The Working Gene Set (* see above) repeat_features.gff.bz2 - MIPS/REcat features, annotated with RepeatMasker cereal_alignments.gff.bz2 - BLAT alignments of same- and cross-species libraries mysql/ - SQL dumps of the Ensembl maize databases used by the browser, as Bzip2 archives zea_mays_core_50_bac_3a.sql.bz2 - BAC sequences and underlying annotations zea_mays_core_50_fpc_3a.sql.bz2 - The maize agarose FPC map software/ - Software related to the maize project maize-ensembl.tar.gz - The source code of the Maize Ensembl plugin NOTE ABOUT GENE TRANSLATIONS: In the case of Fgenesh predictions (te-like, protein-coding, and hypothetical), a small number of genes were predicted with stop-codons. This is an artifact of Fgenesh predicting short genic fragments, often with singleton exons. We do not include such translations in the file dumps. Our website is located at: http://maizesequence.org For more information, please contact us at: info@maizesequence.org ---- Last updated 2008-12-08