A standard set of annotation and assembly files released as part of Phytozome v10. All FASTA and GFF3 files are compressed by gzip to reduce the file size for faster downloads. Note: the number 305 in all file names is a Phytozome internal identifier for the current release of this genome annotation and can be safely ignored. Files in the annotation subdirectory: 1) Mesculenta_305_v6.1.annotation_info.txt A summary of annotation details available in Phytozome. This is a tab-delimited file, as follows: (Note: Columns are blank if no corresponding data is available) 1: Phytozome internal transcript ID (potentially useful to connect to biomart datasets) 2: Phytozome gene locus name 3: Phytozome transcript name 4: Phytozome protein name (often same as transcript name, but this can vary) 5: PFAM 6: Panther 7: KOG 8: KEGG ec 9: KEGG Orthology 10: Gene Ontology terms (NOTE: these are automated results from interpro2go in most genomes, *not* empirically derived) 11: best Athaliana TAIR10 hit name 12: best Athaliana TAIR10 hit symbol 13: best Athaliana TAIR10 hit defline "Best hits" are defined as the top result returned from BLASTP alignment of this species proteome to the target (A. thaliana, O. sativa, or C. reinhardtii listed above). This was run with blast+ 2.2.26 with parameters: blastall -p blastp -F "mS" -b 1500 -v 1500 -e 0.001 -M BLOSUM45 and further filtered with an 1E-3 cutoff e-value. 2) Mesculenta_305_v6.1.cds.fa.gz and Mesculenta_305_v6.1.cds_primaryTranscriptOnly.fa.gz Nucleotide FASTA format file of all gene coding sequences, with or without alternative splice variants 3) Mesculenta_305_v6.1.protein.fa.gz and Mesculenta_305_v6.1.protein_primaryTranscriptOnly.fa.gz Amino acid FASTA format file of all gene coding sequences, with or without alternative splice variants 4) Mesculenta_305_v6.1.transcript.fa.gz and Mesculenta_305_v6.1.transcript_primaryTranscriptOnly.fa.gz Nucleotide FASTA format file of spliced mRNA transcripts (UTR, exons), with or without alternative splice variants 5) Mesculenta_305_v6.1.gene.gff3.gz GFF3 format representation of all mRNA sequences (UTR, CDS). Genomic coordinates are relative to the reference sequence in column 1 6) Mesculenta_305_v6.1.gene_exons.gff3.gz GFF3 format representation of all mRNA sequences as above, but with exon subfeatures. Genomic coordinates are relative to the reference sequence in column 1 7) Mesculenta_305_v6.1.synonym.txt Tab-delimited list of all gene symbol/synonyms for the Phytozome transcript in the first column. 8) Mesculenta_305_v6.1.locus_transcript_name_map.txt Tab-delimited list of all locus name, transcript name for the Phytozome transcript in the first column. 9) Mesculenta_305_v6.1.repeatmasked_assembly_v6.gff3.gz repeat GFF, mostly by RepeatMasker, some by MerMasking, still some derived from masked genome fasta ----- Files in the assembly subdirectory: 1) Mesculenta_305_v6.fa.gz Nucleotide FASTA format of the current genomic assembly 2) Mesculenta_305_v6.softmasked.fa.gz Mesculenta_305_v6.hardmasked.fa.gz Nucleotide FASTA format of the current genomic assembly, masked for repetitive sequence by RepeatMasker (softmasked sequence is in lower case; hardmasked replaces masked sequence with Ns). ----- Files in the additional subdirectories (expression, diversity, etc.) are releases of data related to the current annotation, and are not always available for all organisms.