Proposed Nomenclature Guidelines
(i) Gene products will have names that parallel the genes encoding them, e.g., aroA encodes AroA. (This does not prevent other descriptions of the gene product, e.g., AroA being the universal acronym proper for the enzyme known as DAHP synthase). Standard 4-letter gene and gene product acronyms are referred to as the 'acronym proper'. Note that additional symbols and conventions that convey important information, as described herein, are appended to the acronym proper.
(ii) Genes in a pathway or pathway segment are named in the order of the reactions catalyzed by the gene products, e.g., AroA, AroB and AroC, which catalyze the first three reactions of aromatic biosynthesis, are encoded by aroA, aroB, and aroC.
(iii) The smallest unit of naming is at the level of discrete catalytic or allosteric domains. Multi-domain fusions are designated with intervening bullets, e.g., tyrAaroF encodes fused catalytic domains that abound elsewhere as stand-alone tyrA and aroF domains. Note that a catalytic domain such as TyrA is actually a "supradomain" consisting of an N-terminal cofactor domain and a C-terminal catalytic domain, and potentially these could be named separately. In fact, the cofactor domain (Chothia, et al., 2003) is widely distributed in combination with domains other than the TyrA catalytic domain. However, the separate two domains of TyrA are thus far not known to possess any functional roles as independent entities. TyrA catalytic domains always coexist with the cofactor domain to form an intimate functional unit, and the functional site may very well be created between the two domains (Sun, et al. 2006). In this case, the TyrA name is currently applied to the supradomain, so named as the smallest functional unit at the present time. Hence, in this context of function, we sometimes apply "supradomain" as an equivalent of "domain" in those cases where the smallest functional unit appears to be the supradomain.
Identical functional roles will be associated with identical acronym-proper labels, regardless of whether the gene products are homologs or analogs. On the other hand, note that it is occasionally possible for enzymes catalyzing different reactions to carry the same acronym proper if they are embedded in the same overall metabolic conversion (see rule x).
(iv) If an enzyme consists of subunits, the corresponding gene and gene-product names are designated with additional lower-case letters, e.g., the anthranilate synthase complex consists of the large aminase subunit TrpAa and the small amidotransferase subunit TrpAb, these being encoded by trpAa and trpAb, respectively.
(v) Distinct allosteric domains are designated with 3 capital letters. One example is pheAACT encoding PheAACT.
(vi) Different homology classes (analogs) that have independently acquired the same function are designated with Roman-numeral subscripts, e.g., aroAI and aroAII encode analogs that catalyze the same reaction. The latter exemplifies a case where, on structural grounds, the apparent analogs could possibly be distant homologs that diverged sufficiently to mask definitive recognition of the homology (given the limitation of current resources). However, we do not infer homology if it cannot be proven.
If a homology class consists of distinct, well-separated subgroups, additional lower-case Greek subscripts can be appended to designate them, as illustrated by AroAIα and AroAIβ (Fig. 2). If there were no known analogs, then any well-separated sub-homolog groups would be designated without Roman-numeral subscripts, as is exemplified by TyrAα and TyrAβ.
(vii) If different enzyme reactions converge upon a common intermediate as is the case in early aromatic biosynthesis, genes within one of the convergent branches are designated with a 'prime'. Usually this would apply to the least widely distributed branch. Thus, AroA and AroB, on the one hand, and AroA' and AroB', on the other hand, describe different initial routes that converge to provide exactly the same product (dehydroquinate) to AroC for use as its substrate (White, 2004). Thus, AroA and AroA' each catalyze the first committed step of aromatic biosynthesis in different organisms, but the particular reactions catalyzed are not the same. And the same is true of AroB and AroB'.
Our subsystem coverage does not yet include the large number of connecting links that will be added. Of these, the metabolic linkage to NAD biosynthesis via quinolinate comes to mind because the alternative tryptophan-to-quinolinate and aspartate-to quinolinate pathways (Kurnasov, et al., 2003) will exemplify another instance of pathway convergence to a common intermediate.
(viii) Paralogs, which originated from recent gene duplications and which have no obvious differential functional specializations (one-function paralog family), are distinguished from one another with underscore numbers. Recent gene duplicates (e.g., trpD_1, trpD_2, and trpD_3) might have selective value via manifestation of a gene-dose effect, or they might include pseudogenes destined for elimination (apparently a common phenomenon). If one of the multiple paralogs seems to be uniquely suited to carry out the function corresponding to that of a well-characterized single-gene ortholog in organismal relatives, the preference would be to label it trpD_1. For example, such a scenario would apply in the situation (see (Xie, et. al., 2003) where trpD_1 occupies a perfect and complete tryptophan operon in some cyanobacteria, whereas the extra-operonic trpD_2 and trpD_3 paralogs exhibit especially long branches on a protein tree and lack one or more amino-acid residues known to be important for catalysis (thus being likely pseudogenes). In comparisons of the same genes in a collection of organisms where some of the organisms support multiple paralogs and others do not, the single genes cannot properly be labeled the same as a particular paralog member present in a multi-paralog organism. Thus, for example, organisms with a single trpD gene would simply be denoted trpD since the latter has equally orthologous relationships with each of the recent paralogs present in sister organisms (Fitch, 2000).
(ix) Same-function ancient paralogs that vary in some specialized feature carry appropriate underscore notations. Ancient paralogs arose from gene duplications that preceded speciation (Fitch, 2000). Ancient paralogs are usually differentially specialized, and those with different catalytic functions will carry names that reflect different pathway roles. (The ancient AroA and KdsA paralogs of Fig. 2 would be examples). However, occasional ancient paralogs have retained the same enzymatic function and hence share the same acronym proper, but they are differentially specialized in some other way. For example, the trio of paralogous DAHP synthases in enteric bacteria are AroAIα proteins that are subject to differential regulation by feedback inhibition: AroAIα_W by tryptophan, AroAIα_F by phenylalanine, and AroAIα_Y by tyrosine.
Occasionally some member species of two-paralog lineages possess a single remnant paralog, which, in addition to its usual function, has acquired the function of the lost sister paralog, thus being bifunctional. In such cases the name of the surviving paralog (identified by homology or by operon context) is given first and in bold fonts, and separated from the name of the missing paralog by a double 'slash'. Examples of such relatively rare bifunctional proteins covered in this article are PabAb // TrpAb and AroJIβ // HisG in a small clade of Bacillus species, as well as HisD // TrpCII in Actinomycete bacteria.
(x) Different substrate specificities of homologs typically support different functional roles in different pathways, e.g., the aforementioned DAHP synthase/KDOP synthase dichotomy (Fig. 2). If homologs having different substrate specificities are embedded within the flow route of the same pathway such that they perform equivalent functional roles at the overall pathway level, they will share the same acronym proper, with the differing specificities indicated with subscript identifiers. This is exemplified by the alternative flow routes between prephenate and L-tyrosine in Fig.1. The tyrosine-pathway dehydrogenases catalyze different reactions, being specific for prephenate, for L-arogenate, or able to utilize both. These three variant specificities are indicated with lower-case, rightward subscripts: TyrAp, TyrAa, and TyrAc, respectively. However, at the broader pathway level, the functional role of each of the three is identical. Namely, in each case the cyclohexadienyl substrate is aromatized via an oxidative reaction that is driven by elimination of the ring-attached carbon dioxide.
If substrate ambiguity of such one-pathway homologs extends to a second substrate, specificity for a second substrate can be designated with leftward subscripts, e.g., NADTyrAp is a tyrosine-pathway dehydrogenase specific for the NAD+/prephenate couple, whereas NADPTyrAa refers to specificity for the NADP+/arogenate couple.
(xi) Genes that encode cleavable signal (or transit) peptides are denoted by leading-asterisk superscripts. Thus, aroHIα encodes cytoplasmic chorismate mutase, whereas *aroHIα encodes periplasmic (or secreted) chorismate mutase.
Overall rationale in support of the acronym scheme. The above nomenclature scheme strives to relate the acronym library to the evolutionary thread. This is not absolutely necessary to the extent that the single critical need is to implement a consistent, universal assemblage of acronyms. However, a significant advantage of the system proposed is that a given acronym is designed to convey a large amount of biochemical and evolutionary knowledge. Information is conveyed, not only by what is present in the acronym, but also by what is absent. For example, consider the hypothetical Xyz pathway in which one encounters the gene product XyzCII, encoded by xyzCII. Even being unfamiliar with the Xyz pathway, one knows (because of the 'C') that reference is being made to the third enzyme in the pathway. The Roman-numeral subscript reveals that this enzyme is one of at least two analog classes, and the bullet informs that there is a C-terminal fusion. XyzCII cannot be a subunit component; otherwise there would be a lowercase letter immediately after the acronym proper. There is no cleavable signal or transit peptide: otherwise there would be a leading asterisk. It is not a member of a one-function paralog family, otherwise we would see underscore notations. It is not a member of a homolog family that separates into distinct subgroups; otherwise an α, β, etc. would follow the Roman-numeral subscript. XyzCII has not expanded its functional repertoire by "borrowing" a second functional role that is exercised elsewhere in the lineage by a paralog relative; otherwise the acronym for the "borrowed" functional role of the lost paralog would be applied (with separation by a "double slash") after that of the surviving paralog.
Our nomenclature scheme implements the following guidelines:
Nomenclature for specific pathways
- Click HERE for our recent publications that use the proposed nomenclature.