"All science is either physics or stamp collecting. That which is not measurable is not science." - Ernest Rutherford, known as the father of Nuclear physics.
Biology as a field of study occasionally gets little respect compared to its more quantitative cousins under the umbrellas of mathematics and physics. This isn’t aided by the introduction to biology that many of us encounter early in our schooling: first comes the study of kingdoms, phyla, and species, and later the memorization of mitochondria, chloroplasts, and cellular nuclei. All in all; a pile of facts and details, some of them interesting, but disappointingly unconnected by unifying themes or quantitative principles.
But biology, the natural science that studies life and living organisms, does have a unifying principle that connects every organism that has ever existed on Earth, and even unknown organisms that may exist elsewhere in the universe.
Let's get to know the light that will guide the rest of this course.
The diversity of life is astounding at nearly every scale: Humans are just one of the nearly six thousand different species of mammal. Other members of our furry and big-brained group have sizes spread over three orders of magnitude, from the bumblebee bat in the forests of Myanmar to the blue whale in the Antarctic Oceans.
But mammals are an evolutionary newcomer compared to others; over a million species of insects roam every continent on Earth. Other invertebrates have spread even to the hydrothermal vents on the ocean floor, where some have teamed up with sulfur-breathing bacteria to grow iron plate armor. While some bacteria breath toxic chemicals to extract energy from deep sea vents, others get energy straight from the sun or by consuming materials from other organisms.
Every handful of soil contains many billions of bacteria from millions of different species; only a tiny fraction of which have ever been isolated and studied in a lab.
Lifetimes have been spent categorizing all these different organisms, and enumerating their divergent diets, anatomies, metabolisms, and reproductive cycles. These studies and many others have been lumped together in the field of biology. Biology’s incredible diversity is obvious and evident to an observer, and this is one of the reasons that much of the scientific community was cautious to accept the theory of evolution. It's hard to deny that at first impression, more about life seems to be different than the same.
Some of Charles Darwin’s first and most thorough investigations of evolution came from studying the beaks and other anatomical features of finches. He was in search of shared structures and features that could hint at the relatedness of different bird species. Many identified this as a great way to make a family tree of birds, but not a unifying principle that could be applied to all life.
Why might Darwin's theories not have been particularly convincing to his contemporaries, outside of ornithology (the study of birds)?
Not all organisms have shared features like beaks and wings. At the microscopic scale, there are thousands of known bacterial strains that look completely identical under a microscope. Using anatomical structure and shared features to construct a unifying principle connecting all forms of life was doomed to failure, even though Darwin was conceptually correct. So how can we connect the family tree of birds to that of bats, snakes, or bacteria?
The answer has come from looking very closely. In the last 50 years we've encountered the flip side of biological diversity by studying molecular biology: At a smaller scale, all life is the same. Every form of life from bacteria to dinosaurs have a DNA genome, and its genetic information programs all the other molecules that make up an organism: proteins, RNA, carbohydrates and fats.
Darwin inferred the relatedness of finches by studying common features. What can we infer from recent findings in molecular biology?
Life at the macroscopic scale has remarkable diversity, which can be studied by anatomy, geneaology, paleontology, and other fields of biology. But quantitation and unifying principles are hard to come by at that scale.
All the diversity of different organisms must be present in their DNA genomes. Not in the form of cellular, skeletal or morphological structures that must be studied with an X-ray or a microscope, but in strings of encoded information. We'll learn throughout this course that this genetic code is nearly universal in all forms of life, and indicates that there must be a common ancestor that had a genome just like life today. Genetic information can be quantified and processed at a large scale in a way that traditional biological results simply cannot, as we'll soon see.
If the unity of life at the molecular scale is explained by our shared features with an ancient common ancestor, what principle of life could explain all of life's differences today?
At human scales, evolution is slow and subtle: it took about two million years for Darwin's finches to develop such different beaks. Only careful anatomical comparison can reveal the anatomy a common ancestor to birds, bats and humans may have had. But the changes in genetic information are discrete and quantifiable, even though their signal-to-noise ratio may be low.
By comparing genetic information, it is possible to provide a quantitative measure of relatedness from molecules to organisms that Darwin couldn't have ever dreamed of. Genome analysis of Darwin's finches tracked down singular events over the last million years that lead to each finch's unique beaks. Furthermore, studying how genetic information is translated to traits like beaks and other features can provide far more insight into biology than a thousand years of anatomical study.
For this reason much of modern biology has stepped away from the detailed study of a wide variety of different organisms and their traits and features, and into an information-driven study of the genetic information that makes up the DNA in every form of life.
This shift has been enabled by rapid development in the field of DNA sequencing and been helped in a large part by the Human Genome Project. Understanding genetic information and how it connects to the rest of biology is only possible because of the torrent of sequencing information made available in the last 20 years since this project finished. The complete genomes of hundreds of animals and plants each consisting of gigabytes of data have been collected.
The rise of genetic sequencing has been a gold rush for information theorists; renewing the techniques that used to be in communications, signal processing and statistical physics for the new frontier of biological information. Now that the data is here, what problems should biology research focus on to make the most of it?
This course will explore the field of computational biology. First through an introduction to molecular biology with a focus on the information flow from DNA to more familiar biological features. Then the connection between genetic information and the biological structures of proteins, RNA and cells will be explored using folding algorithms implemented in
Python using dynamic programming techniques.
Once we've gained the biological context, we'll shift gears to analyzing genetic sequences to gain insight into forensics, human history, and disease. Finally we'll retrace the steps of Darwin and reconstruct the tree of life — this time with firmly quantitative principles that let us compare birds, beans, and bats. Throughout, we will focus less on facts and details, and more on unifying principles that will come up again and again in our study of computational biology.
This isn't your grandpappy's biology course.