These guidelines are based on Nowoshilow et al, Developmental Dynamics, 2021. Last updated on 25.04.2021 by Prayag Murawala. For suggestions on guidelines or to propose a new gene name, please write an email to info@axobase.org.
Human, Mice, Zebrafish, Xenopus, Chicken, Drosophila
A quick summary of nomenclature
A detailed description of nomenclature
1. Species abbreviation
2. Gene
3. Protein
4. Orthologs and Paralogs
5. Novel genes
6. Transcript variants
7. Non-coding transcripts
8. Mitochondrial genes
9. Deficiencies, duplication, inversion, insertion, translocation
Transgenic lines and constructs
10. Random insertions
11. Knockout lines
12. Knock-in lines
Members of the axolotl gene nomenclature committee
Species symbol: Amex
Gene/mRNA/cDNA: Prrx1
Protein: PRRX1
Paralogs: Prrx1 and Prrx2
Transcript variant: Prrx1.1, Prrx1.2
Transgenic line: tgSceI(Mmu.Prrx1:GFPnls-T2a-ERT2-Cre-ERT2 )Labcode, tgSceI(Mmu .Prrx1:GFPnls-T2a-ERT2-Cre-ERT2;CAGGS:loxP-GFP-loxP-Cherry )Labcode
Gene knockout: tm(Prrx1153v6D8/153v6D8)Labcode, tm(Prrx1153v6I5/+)Labcode
Gene knock-in: tm(Prrx1t/+:Prrx1-T2a-Cherry)Labcode, tm(Prrx1r/+:Cherry)Labcode, tm(Prrx1r/+:miniPrrx1-T2a-Cherry)Labcode
Although the axolotl (*Ambystoma mexicanum*) is one of the most commonly used salamanders, a number of other species such as *Ambystoma tigrinum* (Tiger salamander), *Ambystoma maculatum* (Spotted salamander) *Notophthalmus viridescens* (Eastern newt), and *Pleurodeles waltl* (Iberian newt) are commonly used for tetrapod tissue regeneration research39. A comparative-omics study often requires identification of orthologous genes across salamander species. We recommend using the first letter denoting the genus name in upper case and the first three letters of the species name in lower case regular font to create the species-specific acronym. For Ambystoma mexicanum, we recommend using Amex as a species symbol. For other ambystomatid species, the same convention would follow, such as Amac (*A. maculatum*), Aand (*A. andersoni*), and Acal (*A. californiensi*). If there is a subspecies, the first letter of the subspecies name can be assigned as a fifth letter in a lower-case italics, and a sixth letter to differentiate between subspecies that share the same first letter of the subspecies name. For example, there are several *A. tigrinum* subspecies, including *A. tigrinum mavortium*(Atigma), *A. tigrinum melanosticum*(Atigme), and *A. tigrinum nebulosum*(Atign).
A gene symbol is described by a series of alphanumeric characters, where the first character is upper-case italics and the rest are lower-case italics. This is in line with the mouse gene nomenclature and it follows current axolotl gene annotations2. For example, a gene Paired related homeobox 1 is referred to as *Prrx1*.
2.1 Species Ambiguity
When there is an ambiguity about the orthology of a gene from a species, we recommend using a prefix to indicate the species name. The prefixes should not be considered part of the name and hence, a period should be placed between species and the gene name, while an entire phrase should be written in italics, e.g. Amex.*Prrx1*. When there is no ambiguity of species used for the study, the species symbol must be omitted when describing the gene.
2.2 Gene, mRNA, cDNA
Often a gene, mRNA and cDNA need to be distinguished in a text. In such cases, we recommend putting “gene”, “mRNA” or “cDNA” in regular font in parentheses in front of the gene symbol. Gene, mRNA or cDNA words must be omitted to describe a gene when there is no ambiguity or after describing once in the beginning of the communication.
Gene: Prrx1
In case of ambiguity between gene, mRNA, cDNA: (gene)*Prrx1*, (mRNA)*Prrx1*, (cDNA)*Prrx1*
In case of species ambiguity: (gene)Amex. *Prrx1*
The protein symbol is same as the gene symbol, but in regular (non-italic) font, with all characters in upper-case.
Protein: PRRX1
Species | Gene nomenclature | Protein nomenclature | Reference |
---|---|---|---|
Axolotl | Prrx1 | PRRX1 | Nowoshilow et al, Dev Dyn, 2021 |
Zebrafish | prrx1 | Prrx1 | Reference28,29 |
Xenopus | prrx1 | Prrx1 | Reference26,27 |
Anole | prrx1 | PRRX1 | Reference32 |
Mouse | Prrx1 | PRRX1 | Reference40 |
Chicken | PRRX1 | PRRX1 | Reference31 |
Human | PRRX1 | PRRX1 | Reference41 |
Note the differences in gene and protein naming convention among commonly used model organisms in Table 1.
Fig. 1. Gene nomenclature for duplicated and novel genes. a. Two Cat loci arose by gene duplication. Comparison with the human genome reveals a single copy of CAT in the human genome. Thus, Cat1 and Cat2 are paralogs in the axolotl genome. b. Annotation of orthologs and paralogs. Geneb1 is annotated based on its homology to GENEB in another organism (indicated by the pattern of exons), while a paralogous gene is annotated as Geneb2 since it has a lower sequence similarity. Grey shaded area indicates the chromosome to highlight that Geneb1 is in a different locus c. A putative novel gene family member in axolotl that has some (BLAST e-value 1e-16) similarity to a gene in another organism. d. A putative novel gene that does not have any homologs, but has a long (1941aa) open reading frame.
The chromosome-scale axolotl genome assemblies2,3 made it possible to study not only the coding sequences, but also the evolution and synteny at the whole-genome level and accurate gene annotation is vital for those purposes. However, to ensure accurate and unambiguous communication between scientists, it is crucial to work out a set of rules for how the genes and proteins should be named, how to distinguish paralogs and orthologs, and how to name pseudogenes.
Unlike zebrafish and Xenopus laevis, the axolotl genome was not structured by whole-genome duplication events. However, axolotl-specific (i.e. in-paralogs) as well as salamander-specific (i.e. out-paralogs) gene duplicates are known (Fig. 1a) and can be shared with other salamander taxa42. In order to define the evolutionary relationship between the genes, it is crucial to define whether the genes in different species derive from a common ancestor (orthologs), or whether the genes in the same species arose from the same ancestor (paralogs)43. While it is important to standardize the gene names for orthologous genes, it is also important to keep the gene naming compatible to that used across vertebrates, in order to make the analyses of the axolotl data comparable with the large body of available human and mouse datasets. Similar to the human and mouse orthologous gene nomenclature (HUMOT) project44, we propose to rely on a mixed approach that combines both the orthology information with the synteny and also integrates the data from expert orthology resources. The importance of the synteny can be demonstrated by a following example. Imagine a situation that in the axolotl, a gene homologous to a gene GENEB in another species was duplicated (Fig. 1b), while the copy that is more similar to the ancestral gene (indicated by the exon pattern in Fig. 1b) moved out of the locus. In this case, the copy that stayed in the locus should be annotated as Geneb2, while the one that moved out should be annotated as Geneb1. In contrast, if the copy that moved out is less similar to GENEB than the one that stayed, then the latter should be annotated as Geneb1 and the former as Geneb2. If synteny is not conserved then phylogenetic analysis should be used to identify the relationships between the genes. As these relationships can be complex it is better to deliberately exclude ambiguous orthology information than to propagate incorrect assumptions about the genes.
In the case of paralogous genes, one must be very cautious in order to avoid name collisions across species. Imagine, in the above example, there were also a gene GENEB2 in the same species. However, this GENEB2 is not orthologous to the newly annotated *Geneb2* in axolotl. In this case, we suggest using the next available number in the series, following consultation with other gene nomenclature groups. In the outlined example, it would be Geneb3. Hence, we always recommend contacting the axolotl gene nomenclature committee when naming a gene.
Paralogs:
Paired related homeobox 1 and Paired related homeobox 2: Prrx1 and Prrx2
Duplicated genes:
Catalases – Cat and Cat2
While paralogous genes should be treated as described above, potential novel genes should be annotated differently. Ideally, novel genes should be characterized functionally and named based on their function, whenever possible. However, for the vast majority of the novel genes this is not the case. We therefore suggest to examine the orthologous relationships to other functionally characterized or unambiguously annotated genes in axolotl or other organisms. If a novel gene can be assigned to a certain protein family but fails to fulfill the criteria outlined in 4 to be annotated unambiguously, we suggest adding a suffix ‘-like’ to the gene symbol of the closest ortholog based solely on the sequence homology. This annotation may be changed later when its existence is confirmed in the lab and the functional data become available. Finally, if none of those approaches can be applied, the novel gene should be annotated as Locxxxx (Fig. 1d), where xxxx are the NCBI gene IDs. In this case, the gene sequence should be submitted to NCBI first to get the NCBI gene ID, which ensures that this gene symbol is not arbitrary.
Example:
Prothymosin-alpha-like : Prothymosin-alpha-like – Ptal (Fig. 1c) and
Loc12345 between Tmem79 and Smg5 (Fig. 1d).
5.1 Pseudogenes
Some genes may lose their open reading frame (ORF) in the course of the time and, thus, also their function. However, since the gene sequences are still present in the genome, they must be accurately annotated as they may have undesired effects on transgenics, transcript quantification and other analyses that rely on the gene sequence. To stay consistent with the nomenclature guidelines in other organisms and particularly in humans24, we suggest appending the suffix ‘p’ (pseudogene) followed by a number to the gene symbol of the ancestral gene if the gene is processed, while naming unprocessed genes as new members of the family with the suffix ‘p’.
We propose that the transcript variant names are formed after the following schema: GeneName.TranscriptNumber. All predicted transcripts of a gene are numbered by the order in which they were annotated. While in organisms with a well-established genome annotation, the nomenclature does not specifically deal with the transcript annotations, we feel that in axolotl the annotation is frequently changed at the moment and therefore it is vital to keep track of which isoforms were proven to be wrong. For example, imagine that a gene has three annotated isoforms, Gene.1, Gene.2 and Gene.3. However, it turns out that Gene.3 is just an artifact. Nevertheless, another isoform, Gene.4, is shown to be very tissue-specific. At this point, it is better to have the isoforms annotated as Gene.1, Gene.2 and Gene.4, to indicate that Gene.3 does not exist and avoid confusion in case any works referred to Gene.3 before it was excluded.
Example: transcript variants of Prrx1
Prrx1.1, Prrx1.2
Similar to the guidelines for the human genome outlined in Seal et al, 202045, we propose that non-coding transcripts are annotated according to their RNA type. MicroRNAs should be annotated as “mir-XXX”, where XXX is the submission ID in the miRbase database46. Transfer RNAs should receive gene names following the pattern tRNA-XXX-YYY- GtRNAdbID, where XXX is the three-letter amino acid code, YYY is the anticodon and GtRNAdbID is the gene ID in GtRNAdb database47. Other classes of non-coding RNAs should be named after consulting the gene naming committee as very little work has been done on non-coding RNAs in axolotl so far and, thus, the exact requirements will be met later.
Similar to the guidelines in other vertebrates, we suggest to use the gene symbol with the suffix ‘-as’ for non-coding transcripts that originate from the promoter as an annotated protein-coding gene on the opposite strand, e.g. Dio3-as for a non-coding RNA that originates from the Dio3 promoter.
In order to stay consistent with the well-annotated species, we propose to use the human annotation of the mitochondrial genes (NCBI Reference Sequence: NC_012920.1) in axolotl. However, in agreement with the remainder of the nomenclature, only the first letter should be capital, e.g. Mtnd2 for the mitochondrially encoded NADH dehydrogenase 2, while the human counterpart would be MTND2.
The axolotl genome is made up of ~32 Gb of DNA, which are distributed across 14 chromosomes2,3,48. Chromosomes are numbered in a descending order based on their size, which was initially determined by meiotic mapping and recombination distances that define linkage groups37,49. Thus, as also seen in the human genome, chromosome ordering by size does not exactly correlate with ordering based upon chromosome base pair length. Further, axolotl chromosomes are divided into short and long arms via centromere. Conventionally, the short arm is called the p arm, while the long arm is called the q arm. For the sake of consistency, we propose to retain this nomenclature for the axolotl chromosome arms.
Chromosomal aberrations are known in every species and the axolotl is no exception. They can be mainly classified into deficiencies, duplication, inversion, insertions and translocations. Although a few chromosomal aberrations have been reported to date 42, we anticipate that such annotations will arise in the near future from the analyses of the genome and transcriptome assemblies. We propose to use the following prefix for each of them which is in line with the usage of these terms by the zebrafish community 29.
deficiencies, Df
duplication, Dp
inversion, In
insertion, Is
translocation, T
Further, chromosome rearrangements are indicated with the following prefixes in italics, followed by the chromosome aberration details in parentheses and in italics, which in turn is followed by the name of the line in a regular font.
Df(Chr#:xxx)lineNN
Since the first successful axolotl transgenics, which were ubiquitous fluorescent reporter expression lines6, transgenesis has made significant progress. In the last decade, a number of transgenic methodologies were successfully implemented in axolotl. This includes the I-SceI mediated transgenesis, Tol2-mediated transgenesis, TALEN-mediated transgenesis and CRISPR/Cas9-mediated transgenesis, all of which allow researchers to perform random transgenesis, generate knock-outs and knock-ins in axolotl5,7-10. With these advances, it is expected that more transgenic animals will be generated in the near future and a standard transgenic nomenclature is, thus, needed to assign the identifier information in a consistent and rigorous manner.
I-SceI and Tol2- mediated transgenesis utilizes the flanking I-SceI restriction sites or Tol2 transposable elements to the cassette of interest. Co-injection of such construct with the I-SceI meganuclease or Tol2 mRNA/protein allows for random integration of the cassette of interest into the genome. We recommend highlighting such transgenic animals by the tg symbol followed by the method of transgenesis and name of the cassette in parentheses. We also propose separating regulatory elements (enhancer and promoters) and the coding sequence by colons. The name of the full cassette should be written in italics. If a foreign regulatory element is used for making the transgenic animal, then it should be mentioned as a one letter genus and three letter species symbol followed by a period in the nomenclature, such as from mouse – M. musculus (Mmu). If the regulatory element of the axolotl is used then the species information can be omitted. Further, we recommend appending the developer’s lab code as a superscript after the parentheses to indicate the origin of the transgenic animal. In order to obtain the lab code, developer should register their lab/organization with international laboratory code registry (ILAR) (https://www.nationalacademies.org/ilar/lab-code-database).
tgSceI(Mmu.Prrx1:GFPnls-T2a-ERT2-Cre-ERT2)Labcode
Tol2 mediated transgenesis
tgTol2(Mmu.Prrx1:GFPnls-T2a-Cre-ERT2)Labcode
Often, transgenic animals are made with more than one cassette. In such instances, we suggest the use of semicolon (;) between two cassettes.
tgSceI(Mmu.Prrx1:GFPnls-T2a-ERT2-Cre-ERT2;CAGGS:loxP-GFP-loxP-Cherry)Labcode
Gene mutants are often generated with the aim to perform functional analysis. With the advent of CRISPR/Cas9, it has become relatively easy to generate such lines. Such germ-line transmitted targeted mutations™ should be characterized for insertion or deletion and they should be annotated as I or D, respectively. In addition, the line name should also contain the position of the indel with respect to the start of the gene, the version of the genome and the number of nucleotides that are inserted or deleted. If the generated animals are heterozygous then the wildtype (+) should be mentioned as a second allele.
Example:
tm(Prrx1153v6D8/153v6D8)Labcode, refers to a homozygous deletion of 8 nucleotides at the nucleotide position 153 with respect to the start of Prrx1 in the genome version 6.
tm(Prrx1153v6I5/+)Labcode, refers to a heterozygous insertion of 5 nucleotides at the nucleotide position 153 from the beginning of Prrx1 in the genome version 6.
Similarly, the names of a knock-in animal should contain the name of the gene locus where the transgene is inserted. At the moment only non-homologous end-joining (NHEJ) mediated knock-in possible8 and such transgene knock-ins are generated by targeted mutation™ at either the N-terminus or the C-terminus of the endogenous ORF. This, in turn, may either retain or disrupt the native ORF. Similar to the cassette in the random transgenesis, regulatory elements and the coding DNA sequence should be separated by a colon and italicized. When a transgene is inserted at the C-terminus as a tag without disrupting the native coding sequence then it should be written as follows.
tm(Prrx1t/+:Prrx1-T2a-Cherry)Labcode, refers to a heterozygous knock-in at the Prrx1 locus, which retains the native Prrx1 gene structure and allows for tagging (t) at its C-terminus resulting in a fusion of Prrx1 with T2a-Cherry. Similarly, a homozygous knock-in should be labeled as tm(Prrx1t/t:Prrx1-T2a-Cherry)Labcode.
However, the N-terminal insertion of a transgene via the NHEJ disrupts the native gene sequence. Hence, the N-terminus knock-ins are generated in one of the following two ways.
When the native coding sequence is disrupted leading to a heterozygous genotype,
tm(Prrx1r/+:Cherry)Labcode, refers to the heterozygous N-terminus knock-in at the Prrx1 locus, which replaces (r) the native gene with Cherry. In this situation, the native gene is not active and hence, these animals are heterozygous knock-outs for Prrx1.
When the native coding sequence is disrupted, but replaced with the cDNA of the native gene, which is also refered to as a mini-gene,
tm(Prrx1r/+:miniPrrx1-T2a-Cherry)Labcode, refers to a heterozygous N-terminus knock-in at the Prrx1 locus, which replaces (r) the native gene with a mini-gene version of Prrx1(cDNA), which is fused to T2a-Cherry. In this situation, since the native gene is replaced by the Prrx1 cDNA, the animals are not considered as knock-outs for Prrx1.
Finally, we want to once again point out that the axolotl gene nomenclature committee should be contacted every time when naming a gene. We envisage that such systemic nomenclature would remove confusion among researchers and serve the entire community.
If you would like to help us, please write an email to info@axobase.org.
Dr. Elspeth Bruford, EMBL – EBI, Cambridgeshire, UK