Example of sequence formats in the input file:
[Source of the sequence: GenBank]
>gnl|REF_E.coli|c3122:1-379 3-deoxy-7-phosphoheptulonate synthase [Escherichia coli CFT073]
MCYRYVILAEDQLSQTSINRIAIMQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKTISDI
IAGRDPRLLVVCGPCSIHDPETALEYARRFKALAAEVSDSLYLVMRVYFEKPRTTVGWKGLINDPHMDGS
FDVEAGLQIARKLLLELVNMGLPLATEALDPNSPQYLGDLFSWSAIGARTTESQTHREMASGLSMPVGFK
NGTDGSLATAINAMRAAAQPHRFVGINQAGQVALLQTQGNPDGHVILRGGKAPNYSPADVAQCEKEMEQA
GLRPSLMVDCSHGNSNKDYRRQPAVAESVVAQIKDGNRSIIGLMIESNIHEGNQSSEQPRSEMKYGVSVT
DACISWEMTDALLREIHQDLNGQLTARVA
[Source of the sequence: the SEED]
>fig|83333.1.peg.2568 [Escherichia coli K12] [2-keto-3-deoxy-D-arabino-heptulosonate-7-phosphate synthase (EC 2.5.1.54)]
MQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKSISDIIAGRDPRLLVVCG
PCSIHDPETALEYARRFKALAAEVSDSLYLVMRVYFEKPRTTVGWKGLINDPHMDGSFDV
EAGLQIARKLLLELVNMGLPLATEALDPNSPQYLGDLFSWSAIGARTTESQTHREMASGL
SMPVGFKNGTDGSLATAINAMRAAAQPHRFVGINQAGQVALLQTQGNPDGHVILRGGKAP
NYSPADVAQCEKEMEQAGLRPSLMVDCSHGNSNKDYRRQPAVAESVVAQIKDGNRSIIGL
MIESNIHEGNQSSEQPRSEMKYGVSVTDACISWEMTDALLREIHQDLNGQLTARVA
OUTPUT AFTER CONVERSION:
>Ecol_F_f_AroA_a gnl|REF_E.coli|c3122:1-379 3-deoxy-7-phosphoheptulonate synthase [Escherichia coli CFT073]
MCYRYVILAEDQLSQTSINRIAIMQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKTISDI
IAGRDPRLLVVCGPCSIHDPETALEYARRFKALAAEVSDSLYLVMRVYFEKPRTTVGWKGLINDPHMDGS
FDVEAGLQIARKLLLELVNMGLPLATEALDPNSPQYLGDLFSWSAIGARTTESQTHREMASGLSMPVGFK
NGTDGSLATAINAMRAAAQPHRFVGINQAGQVALLQTQGNPDGHVILRGGKAPNYSPADVAQCEKEMEQA
GLRPSLMVDCSHGNSNKDYRRQPAVAESVVAQIKDGNRSIIGLMIESNIHEGNQSSEQPRSEMKYGVSVT
DACISWEMTDALLREIHQDLNGQLTARVA
>Ecol_A_f_AroA_b_2568 fig|83333.1.peg.2568 [Escherichia coli K12] [2-keto-3-deoxy-D-arabino-heptulosonate-7-phosphate synthase (EC 2.5.1.54)]
MQKDALNNVHITDEQVLMTPEQLKAAFPLSLQQEAQIADSRKSISDIIAGRDPRLLVVCG
PCSIHDPETALEYARRFKALAAEVSDSLYLVMRVYFEKPRTTVGWKGLINDPHMDGSFDV
EAGLQIARKLLLELVNMGLPLATEALDPNSPQYLGDLFSWSAIGARTTESQTHREMASGL
SMPVGFKNGTDGSLATAINAMRAAAQPHRFVGINQAGQVALLQTQGNPDGHVILRGGKAP
NYSPADVAQCEKEMEQAGLRPSLMVDCSHGNSNKDYRRQPAVAESVVAQIKDGNRSIIGL
MIESNIHEGNQSSEQPRSEMKYGVSVTDACISWEMTDALLREIHQDLNGQLTARVA
EXPLANATION:
Xxxx or Xxxx-num -- Acronym unique to a species and will not change.
A,B,C,.......... -- the UPPERCASE letters designate a strain and when combined with the species acronym, it is unique at the strain level
(e.g., 'Ecol_A' always stands for E. coli K12 and 'Ecol_F' for E. coli CFT073).
f,u,............ -- Genomic sequencing status: 'f' - finished or complete genome, 'u' - unfinished genome or no genomic sequencing effort.
The list of finished genomes used here is from the 'Published Complete Genomes' in the GOLD database.
a,b,c,.......... -- the lowercase letters designate different copies of the proteins (paralogs) within the same strain.
|