MaizeSequence.org FTP Site ========================== This site provides access to the latest sequenced maize data. The site is part of the NSF-funded Maize Genome Sequencing Project. +----------------------------------------------+ | Release 4a.53 | +--------------------------+-------------------+ | Assembly Date | March 20, 2009 | | Assembly Version | AGPv1 | | BAC clones | 16,007 | | BAC contigs | 175,108 | | Genome Length | 2,061,021,377 bp | | Total DNA | 3,050,815,844 bp | +--------------------------+-------------------+ | Working Gene Set* | 108,754 | | Evidence-based genes | 93,746 | | Fgenesh models | 14,008 | | Transcripts | 134,099 | | Filtered Gene Set | 32,540 | | Evidence-based genes | 30,399 | | Fgenesh models | 2,141 | | Transcripts | 53,764 | +--------------------------+-------------------+ * - The Working Gene Set: A set of genes has been defined as the entire set of evidence-based genes (predicted by Gramene GeneBuilder) that is then complemented by a set of Fgenesh models, predicted on masked DNA sequence, that does not overlap with the loci of the evidence-based genes. The filtered set was generated by screening the working set to remove pseudogenes, TE- encoded genes, and low-confidence hypothetical models. ==== CONTENTS ==== All sequence dumps or other large files have been compressed using BZ2 both for space constraints and for faster downloads. Genome Sequence --------------- The maize genome can be found in assembly/ ZmB73_AGPv1_genome.fasta.bz2 The entire raw genome assembly as one file. ZmB73_AGPv1_genome_masked.fasta.bz2 The genome assembly masked by MIPS repeats. ZmB73_AGPv1_chr*.fasta.bz2 Individual chromosome files (unmasked). Note that chrUnknown is an artificial collection of unanchored BAC clones. ZmB73_AGPv1_genome.agp The accessioned golden path (AGP) describing how the genome is assembled through sequenced BACs. ZmB73_AGPv1_chr*.agp The AGP data for individual chromosomes. Gene Sequences -------------- The Working Gene Set and the Filtered Gene Set are represented in various dumps. Each set exists in its own directory. Note that both gene sets are composed of both evidence-based genes and ab initio genes predicted by Fgenesh. Evidence-based genes have an ID of the format GRMZM2GXXXXXXXX while Fgenesh genes are of the format ACXXXXXX.Y_FGZZZZ (where X is a BAC accession on which the gene was predicted, Y is the version number, and Z is the unique gene identifier). The Working Gene Set: working-set/ ZmB73_4a.53_WGS_info.txt A tab-delimited table describing genes, transcripts, and various bits of useful information, including location and classification. ZmB73_4a.53_WGS.gff.bz2 A GFF3 dump of the genes along with their underlying structure. ZmB73_4a.53_working_genes.fasta.bz2 Genomic sequences of the Working Gene Set. ZmB73_4a.53_working_genes_500.fasta.bz2 Genomic sequences of the Working Gene Set with 500 bp of flanking context on both sides. ZmB73_4a.53_working_cdna.fasta.bz2 cDNA sequences of the Working Gene Set. ZmB73_4a.53_working_cds.fasta.bz2 CDS sequences of the Working Gene Set. ZmB73_4a.53_working_pre_mrna.fasta.bz2 Genomic sequences of the Working Gene Set with annotated exon-intron structure (introns are soft-masked, i.e., lowercase) for each alternate transcript. ZmB73_4a.53_working_translations.fasta.bz2 Peptide sequences of the Working Gene Set. filtered-set/ ZmB73_4a.53_FGS_info.txt A tab-delimited table describing genes, transcripts, and various bits of useful information, including location and classification. ZmB73_4a.53_FGS.gff.bz2 A GFF3 dump of the genes along with their underlying structure. ZmB73_4a.53_filtered_genes.fasta.bz2 Genomic sequences of the Filtered Gene Set. ZmB73_4a.53_filtered_genes_500.fasta.bz2 Genomic sequences of the Filtered Gene Set with 500 bp of flanking context on both sides. ZmB73_4a.53_filtered_cdna.fasta.bz2 cDNA sequences of the Filtered Gene Set. ZmB73_4a.53_filtered_cds.fasta.bz2 CDS sequences of the Filtered Gene Set. ZmB73_4a.53_filtered_pre_mrna.fasta.bz2 Genomic sequences of the Filtered Gene Set with annotated exon-intron structure (introns are soft-masked, i.e., lowercase) for each alternate transcript. ZmB73_4a.53_filtered_translations.fasta.bz2 Peptide sequences of the Filtered Gene Set. The format of each sequence comment within the gene files is: >OBJECT_ID CLONE:START:END:STRAND:ANALYSIS:CLASSIFICATION OBJECT_ID The unique ID of the gene, transcript, translation, or exon CLONE, START, END, STRAND Locus information about the specific object ANALYSIS The prediction method used to annotate this object CLASSIFICATION The assigned class based on homology to peptides in the NR database: protein_coding Significant homology to a known non-TE protein transposon_pseudogene Significant homology to a known TE protein protein_coding_unsupported No significant homology, i.e., hypothetical Functional Annotations ---------------------- InterPro and Gene Ontology (GO) annotations for genes are provided in various file dumps. functional_annotations/ go_2_go_slim_plant.txt Reference of GO term hierarchies used for functional analysis ZmB73_4a.53_interpro.txt InterPro ID assignments based on Working Gene Set protein domains ZmB73_4a.53_protein_goslim_plant.txt GO/full annotations ZmB73_4a.53_protein_go.txt GO/slim annotations based on plant function Repeats ------- Repeat annotations were generated by RepeatMasker using the MIPS/REcat library. repeats/ ZmB73_4a.53_MIPS_repeats.gff.bz2 GFF3 dump of the MIPS/REcat repeat annotations 2009-10-16: Software and database dumps are forthcoming Our website is located at: http://maizesequence.org For more information, or to make requests for specific file dumps, please contact us at: info@maizesequence.org ---- Last updated 2009-10-16