Bit encoding of nucleic acid types
Unambiguous nucleotides are represented in one-hot encoding as follows:
| NucleicAcid | Bits |
|---|---|
| A | 0001 |
| C | 0010 |
| G | 0100 |
| T/U | 1000 |
Ambiguous nucleotides are the bitwise OR of these four nucleotides. For example, R, A or G, is represented as 0101 (= A: 0001 | G: 0100). The gap symbol is always 0000. The meaningful four bits are stored in the least significant bits of a byte.
This encoding applies to both the DNA and RNA types.