A biologist wants to encode a large genetic sequence consisting of the letters
In his data, C and G were the most common letters, so he chose the encoding
C -> 1, G -> 0, A -> 00, T -> 01. For example,
CCGT is encoded as
Later, the biologist noticed
11001 in his codebase. He concluded that it represented
CCGT. Is he correct?
The following code allows you encode a DNA string. Try running the code, and then change the string in line 12 to see if you can find other strings with the same encoding!
Bonus 1: Can you describe a better encoding scheme?
Bonus 2: Can you write code that returns all possible genetic sequences from a given encoding?