CRISPR
(An Art of Artificial Evolution)

CRISPR is the modern, fastest, and cheapest gene editing tool. CRISPR stands for Clustered Regulared Interspaced Short Palindromic Repeat. So CRISPR is the clusters of regular interspaced short palindromic (Palindrome means a word, number, or any phrase that reads the same backward and forwards like the word ‘madam’ or numbers like 424, 515, etc are the same as backward or forward reading) nucleotides that repeated itself in the genome. It is a genetic tool that is used in gene modification. To understand the true meaning of CRISPR, it is important to understand the following things:

What is the origin of the concept of CRISPR?
How CRISPR is used as a gene-editing tool?
How the gene of humans can be modified by using this tool (CRISPR)?

Now the first question is, what is the origin of the concept of CRISPR?

The story begins with a war between bacteria and viruses. Viruses attack bacterial cells (i.e. bacteriophages) to increase their numbers or control the bacterial cell while bacteria usually kill the virus cells (breaking its DNA) to protect itself.

The bacterial cells have a special enzyme called Nuclease. This enzyme is designed specially to break the genetic material (i.e. DNA or RNA) of any organism. If this enzyme attacks RNA (of virus) then it is an RNAnuclease enzyme while if it attacks on DNA then it is called a DNAnuclease enzyme. It all depends on the nature of the molecule.

Nuclease enzymes can also be exonuclease and endonuclease. All these enzymes have almost the same function, i.e. to protect prokaryotes (i.e. bacteria) from pathogens (i.e. viruses). The concept of CRISPR comes from this story, i.e. the ability of bacterial cells against viruses.

Natural immunity can be developed in bacteria by surviving against pathogens like viruses. Humans tried to produce the same immunity at the genetic level in different organisms (especially in humans). One of the very successful gene editing techniques is CRISPR. Many scientists are working on the diseases like sickle-cell anemia, HIV, etc by using the CRISPR technique.

The principle of this gene editing technique is based on the relationship between bacteria and viruses. Its concept is taken from the way of producing a natural immunity in bacterial cells against viruses.

When a virus enters the bacterial cell, it releases its genetic material (that may be RNA or DNA).
if bacteria succeed to overcome this viral cell (with the help of nuclease enzymes), then they can also store the genetic makeup of the virus in their memory.
This helps the bacteria to identify the virus in the future and provide a protective shield against the same viral cell.
When another virus having the same genetic makeup tried to control or kill the bacterial cell (with its genetic material), bacteria identified the virus and has information about its genetic makeup in its memory.
Then bacteria started to produce an RNA-binding molecule from its DNA.
This mRNA produces nuclease enzymes against this virus which breaks the viral genetic material (DNA or RNA) and ultimately kills the virus cell.
This nuclease enzyme is usually an endonuclease enzyme.
Cas-9 is a good example of it.

Now, we have another question, what is endonuclease enzyme, and how it breaks the DNA of a virus?

Many prokaryotes release nuclease enzymes for their protection. Endonuclease enzyme is one of them. It is called “endo” because it usually breaks DNA from the middle parts where the recognition sites for this enzyme are present. It's like a seizer enzyme that cuts the backbone of DNA molecules from its recognition sites.
Now the recognition sites for this bacteria are usually palindrome (same in reading from starting or ending points like mom, dad, 646, etc). The nucleotides of DNA also has palindromic sequence like AATT - TTAA, CGCG - CGCG, etc.
As DNA is double-stranded (5’-3’ end or 3’-5’ end). The above examples (AATT - TTAA) show two strands of DNA. If one strand has nucleotides AATT, then the other strand must have a TTAA sequence of nucleotides. If we start reading the sequence of nucleotides from the 3’ end of both strands then the sequence is the same (AATT or TTAA) and if we start reading from the 5’ end then the same thing will happen.
This type of sequence is called the palindromic sequence of DNA and this kind of palindromic sequence is usually the recognition site of endonuclease enzymes.

Bacteria can store information about the genome of the virus and can identify the palindromic sequence of nucleotides of the virus. After having a piece of information, bacteria can produce endonuclease enzymes (whenever he needs) against that virus and in this way, they boost their immunity level against pathogens and adapt to their environment.

Scientists found a very interesting thing about the Cas9 (endonuclease enzyme) it can cut any of the DNA (where it found its recognition sites) without any discrimination.

Now we have a very interesting question here. If the endonuclease enzyme can cut the backbone of DNA from its palindrome recognition sites then these recognition sites are also present in bacterial DNA and as these palindrome recognition sites are present in the junk part of DNA so it is repeated continuously, then how bacteria protect itself from these nuclease enzymes?
The simple answer to this question is due to the methylation of DNA. Methylation and acetylation of DNA are responsible for DNA regulation by making it inactive and active forms respectively.

The junk part of DNA is usually inactive (does not participate in protein formation or silence genes) due to having methyl groups. If these recognition sites are present in the active parts of DNA, then it is also closed because of methyl groups. Because DNA expresses itself only when it needs it. And because of these methyl groups, DNA wrapped around histone protein very hard. So endonuclease enzymes are not able to cut or break the host (bacterial) DNA.

As bacteria can restrict the pathogens (viruses) with endonuclease enzyme so it can be said a restriction endonuclease (discovered in 1963). Cas-9 protein is a type of restriction endonuclease and CRISPR (Clustered Regulared Interspaced Short Palindromic Repeat) is a technique in which we can edit and modify the genome. This technique is based on the principle of restriction endonuclease or restriction-modification system.

The CRISPR technique was introduced by American scientist Jennifer Doudna and French scientist Emmanuelle Charpentier and their colleagues in 2012. They identified it and started to use it for gene editing purposes in eukaryotic cells (especially in humans). They can repair the gene with very successful results. They got Nobel prizes in 2020 for their work.

Now the second question is, how is CRISPR used as a gene editing tool?

Bacteria have natural mechanisms to boost their immunity against viruses by breaking the DNA of viruses. Bacteria is a prokaryotic organism. Scientists tried to understand this mechanism (in bacteria) and wanted to introduce it in eukaryotic cells, especially in humans against genetic disorders. CRISPR is an RNA sequence of nucleotides that produces Cas proteins (e.g. Cas-9 protein). As it is an endonuclease enzyme so it is very specific to its recognition sites and can cut the specific part of DNA only. CRISPR is the RNA that identifies the location where to cut the DNA and Cas9 protein is the result that cut the DNA.

There are a few general steps to using the CRISPR technique.

Identification: Identify that particular gene that is responsible for any kind of problem.
Locate the problem-causing gene: A guided RNA (CRISPR) is programmed or inserted into the cell to deal with that faulty gene. This will bind the DNA and helps to locate the problem-causing sequence of nucleotides. This RNA locates its targeted part of DNA with the recognition sites (usually palindrome sequence) that are present in the DNA.
Break and remove the specific sequence of nucleotides of DNA: Cas-9 (Cas protein system), a special protein that can cut DNA. After the guided RNA (CRISPR) finds its recognition sites on DNA, It produces Cas-9 protein which cut that part of DNA (faulty or targeted region of DNA). We should remove that sequence to prevent any kind of mutation.
Insertion and replacement of required sequence of nucleotides: After the DNA is cut, it tries to repair or regenerate itself. At this point, the chances of mutation are very high. This mutation can disturb the whole gene. The desired sequence of nucleotides must be provided at this stage so it can become part of DNA and perform desired functions.
Allow DNA to regenerate: Allows the DNA to regenerate itself with a good or healthy sequence of nucleotides. As humans, DNA can regenerate, so it can be regenerated with a sequence of nucleotides that we want.
Precaution: if the targetted sequence is not provided at a time then DNA will regenerate itself with the same faulty sequence of nucleotides or can cause mutation. To prevent this kind of result we should do our work very carefully and have some checkpoints that ensure that we are doing well.

Now, how the genes of humans can be modified by using the CRISPR technique?

As we discussed earlier, the identification of problematic genes is the first step for gene editing. And it is also very important to know the actual sequence of nucleotides or base pairs for that gene. Thankfully, we know the complete genome sequence of humans through the human genome project.

Now, what is Human Genome Project?
The human genome project (HGP) is a mega project launched in 1990. in this project, scientists find the whole genome sequence of humans. The estimated time for this project is 13 years (1990-2003) but it was completed in 2006.
it was a very big and expensive project. The human genome has about 3 billion base pairs. The average base pair costs about 3 dollars so, the cost of this project may be about 9 billion dollars. (the data presented here is just for understanding, the original data may vary from this).

This mega project was initiated by American geneticists with the support of the US Department of Energy and the National Institute of Health. Countries like France, Japan, China, and Germany took a very active part in this project.

The sequence of base pairs of whole DNA is determined by Human Genome Project (HGP). This information is very helpful for genetic disorders. For example, if a person has a color blindness genetic disorder, then with HGP, we know the location of the gene that caused the color blindness so it becomes easy to handle problems like these.

Now, how scientists found the sequence of base pairs of DNA?

In Human Genome Project, scientists studied the whole DNA means both functional and nonfunctional parts of DNA. The following are the general steps that scientists followed to study the DNA base pair sequence:

Break the DNA into segments: Break the DNA by using the restriction endonuclease enzyme. This enzyme breaks the DNA and form segments of it.

Cloning of DNA segments in a host cell: The broken segments of DNA are then inserted in the host cell with the help of a vector. If the host cell is prokaryotic bacteria then the vector is BAC (bacterial artificial chromosomes) or if the host cell is eukaryotic yeast then the vector is YAC (yeast artificial chromosomes).

In the host cell, we got many copies of our broken part of the DNA. Usually, a plasmid is taken as a vector.

A plasmid is a double-stranded DNA that is present in prokaryotic cells (like bacteria). It is not essential for the life of bacteria but can perform many essential functions that can help bacteria. A plasmid can also be transferred from one bacterial cell to another. That’s why it is taken as a vector.

Usually, a plasmid is removed from the cell, and with the help of restriction endonuclease, a specific of DNA is removed. Then add a programmed or desired gene (the copies of DNA we want) in this plasmid. Then inserted this modified plasmid in the host cell where it forms its copies and copies of our DNA.

Sequence the DNA by Sanger’s method: After isolation and making many copies of the segment of DNA, we determined its sequence. Sequence determination can be done in various ways but here we will discuss Sanger’s method to determine the sequence of DNA.

We take a single strand of DNA which we called a template strand also.
We take four different test tubes and add ddATP, ddTTP,ddCTP, and ddGTP (here the word “dd” means “di-deoxy”, which means the deficient of two oxygen molecules) in all these different test tubes separately and labeled them.
We add polymerase enzyme and dNTP (i.e. normal adenine, guanine, cytosine, thymine) in all four test tubes.
Also, add radiolabelled primers in all four test tubes. The selection of primer is totally based on the vector (the selection of vector, endonuclease enzyme, etc is always very specific).
Take the copies of segmented DNA (which we take from host cells) and add an equal amount of these copies of DNA in all four test tubes.
When the templates of segmented DNA are mixed with the mixture of test tubes, the new strand formation starts because of the polymerase enzyme and the presence of nucleotides.
It will form a long strand of nucleotides because the presence of a sufficient amount of nucleotides (adenine, guanine, cytosine, thymine) in all the test tubes and also the presence of polymerase enzyme enforce the formation of long strands of nucleotides.
But the length of the strand is controlled with ddNTP nucleotides (because of the shortage of extra oxygen in their structure).
Now, there are two possibilities in every test tube. One is the template DNA takes dNTP and continuously forms the new strand or takes ddNTP and stops the further formation of a strand of DNA.
The ddNTP stops the formation of a strand of DNA because it is di-deoxy (means two oxygen molecules are absent) and DNA is a deoxy (deoxyribonucleic acid) molecule (absence of only one oxygen).
For example, in a test tube that is labeled as ddATP, the new formation stops when di-deoxy adenine binds thymine. The same principle is run in all the other test tubes.
We take this new strand (that forms in test tubes) from all the test tubes and placed it on an agarose gel. The smallest strand (lightweight) will move fast while the largest strand (heavyweight)of template DNA will move slowly on the gel.
With the help of the distance covered by different strands, we can determine the sequence of nucleotides of DNA.

CRISPR has a huge diversity of applications as it allows gene editing, repair, and modifying it. With the help of the human genome project, it is easy to identify the problem-causing gene and locate its imperfect sequence of nucleotides. CRISPR has just three steps I.e produce a guided RNA, break the DNA, and make (modify) the DNA. This technique may have some complications but it is a very easier technique (to edit genes) than the techniques that were performed in the past and it is very time-saving also. CRISPR is a revolutionary step in the field of genetics.