Every time a human cell divides, its entire genome of 3 × 109 base pairs must be copied. DNA replication is extremely accurate, but mistakes do occur, and sometimes the incorrect nucleotide is incorporated into the growing DNA sequence, giving rise to a mismatch. DNA is also susceptible to damage from cellular and external sources including chemical agents (such as those found in cigarette smoke), ionizing radiation and ultraviolet light. DNA damage must be kept in check in order for an organism to be viable, through DNA repair mechanisms. If mutations survive without being corrected, they can be passed down to future generations, change the function of proteins and cause cancer.
The 2015 Nobel Prize in Chemistry was awarded to Tomas Lindahl, Paul Modrich and Aziz Sancar for their work in elucidating the mechanism of DNA repair.
Mismatches in DNA bases
In normal double-stranded DNA, A pairs with T and G pairs with C. DNA replication enzymes make use of this Watson-Crick base pairing, so that each of the single strands within a DNA duplex acts as a template for the synthesis of a new strand of DNA. However, the degree of selectivity achieved by the association of Watson-Crick base pairing is not sufficient to maintain genetic integrity, and it is estimated that, without external influences, base pairing would allow more than 1% incorporation of incorrect nucleotides during replication. This lack of fidelity is due to competition between the correct G·C/A·T base pairs and eight possible mismatches (or base mispairs): A·A, G·G, A·G, C·C, T·T, C·T, A·C and G·T. The first three of these mispairs (A·A, G·G and A·G) are purine-purine mismatches; the second three (C·C, T·T and C·T) are pyrimidine-pyrimidine mismatches; and the other two (A·C and G·T) are purine-pyrimidine mismatches.
Purine-purine and pyrimidine-pyrimidine mismatches give rise to transversion mutations and purine-pyrimidine mismatches produce transition mutations (Figure 1). DNA polymerases have inbuilt proofreading activity, enabling them to check for the incorporation of incorrect nucleotides, remove them, and insert the correct nucleotide. Even this, however, is not sufficient to prevent mismatch formation.
The G·T mismatch
The first mismatch to be characterized in duplex DNA (by NMR and X-ray crystallography) was the G·T mismatch. The G·T base pair adopts a wobble configuration of the type first proposed by Crick and Brenner to explain G·U mispairing at the third codon position during mRNA:tRNA codon:anticodon interactions (Figure 2).
The A·C mismatch
The A·C mismatch in DNA has a similar structure to the G·T mispair but it is not clear from X-ray analysis how such an association can produce two inter-base hydrogen bonds. It is rarely possible to determine the X-ray structure of a DNA duplex to a sufficiently high resolution to reveal the presence of hydrogen atoms, hence other techniques are used whenever there is ambiguity over the precise nature of the base pair.
For the A·C mismatch there are two possible arrangements that can produce a pairing with two hydrogen bonds: either the adenine base is protonated, or it exists in a rare imino tautomeric form (Figure 3).
Ultraviolet melting experiments and NMR spectroscopy over a wide pH range show that DNA duplexes containing A·C base pairs are unusually stable at low pH, strongly suggesting that the N(1) atom of the adenine base is protonated. It is also possible (for most mismatches) to postulate the occurrence of base pairs with one base present as a minor tautomer; however, there is no direct experimental evidence for tautomeric A·C base pairs.
Minor tautomer mismatches
Minor tautomer mismatches are almost perfect mimics of Watson-Crick base pairs in overall shape but they do not have the same hydrogen-bonding atoms in the major and minor grooves (see Figure 2 and Figure 3). Some examples of minor tautomer mismatches are shown in Figure 2 and Figure 3.
The G·A mismatch
G·A base pairs are particularly interesting because biochemical studies have shown that they are repaired less efficiently in cells than other mismatches. There is no dominant G·A mismatch structure, and four different configurations have been characterized by X-ray crystallography and NMR (Figure 4). The precise form of the G·A mismatch varies with base stacking environment, salt concentration and pH. Because of this variation, the enzymatic repair of the G·A mismatches raises a particular problem because the various forms of the G·A mismatch represent diverse targets for recognition.
Inosine (I) is an analogue of guanosine lacking an amino group on the 2-position of the purine ring. Inosine occurs in RNA, where it participates in base pairing with A, C, and U in codon-anticodon interactions, thus contributing to the degeneracy of the genetic code. Deoxyinosine occurs only rarely in DNA, where it arises by deamination of deoxyadenosine, and is potentially mutagenic.
A number of DNA duplexes containing deoxyinosine have been analysed by X-ray crystallography in an attempt to explain its mutagenicity. The I·T mismatch (Figure 5) adopts the same wobble configuration as the G·T mismatch, whereas the I·A base pair displays similar diversity to the G·A base mispair. It seems unlikely, therefore, that the mutagenicity of inosine can be explained on purely structural grounds. Some deoxyinosine-containing mismatches have surprisingly high thermodynamic stability and this might be a key factor (it is possible that some repair enzymes recognize mismatches by inserting an amino acid residue into the duplex and destroying base pairing; in such cases stable mismatches will be recognized less well than unstable ones).
Mismatches generally destabilize the DNA duplex and give local melting, or opening up, of the double helix, to promote base flipping; and this is one mechanism by which DNA repair enzymes recognize mutagenic lesions. Therefore, in this respect, deoxyinosine-containing mismatches are particularly difficult to recognize.
Molecular biologists have used the special properties of inosine-containing mismatches when designing hybridization probes based on knowledge of protein sequences. Deoxyinosine is included in oligonucleotides in positions where there is a sequence ambiguity owing to the degeneracy of the genetic code. The high thermal stability of inosine-containing mismatches ensures that the oligonucleotide hybridizes efficiently to the target nucleic acid.
The stability of inosine-containing mismatches relative to their guanine counterparts is probably best explained by the destabilizing effect of the 2-amino group of guanine on guanine-containing mismatch base pairs. In the very stable Watson-Crick G·C base pair, the 2-amino group of guanine forms a hydrogen bond with the 2-oxygen of cytosine and another hydrogen bond with a surrounding water molecule. However, in some guanine-containing mismatches (e.g. G·A anti-anti and G·A anti-syn in Figure 4), the 2-amino group of G is sterically prevented from fully hydrogen bonding to water; so there are fewer hydrogen bonds in the duplex than in the single strand, and such mismatches are destabilized. This 2-amino group is absent in inosine, so no such destabilization is observed.
Measuring the stability of mismatch base pairs using UV melting
UV melting is a method for measuring the melting temperature (Tm) of a DNA duplex, an indicator of duplex stability. In an illustrative experiment, four samples were prepared, each containing a fixed DNA sequence of 14 bases and a "complementary" sequence in which one base was varied to include the Watson-Crick base pair (C·G) and the three possible mismatch base pairs. Each sample was analysed by UV melting and the Tm was determined. The melting curves are shown in Figure 6.
The mismatch-containing duplexes were found to have significantly lower melting temperatures than the duplex without mismatches:
The Watson-Crick base-pair (C·G) is the most stable of the four, with the highest melting temperature, of 60.5 ºC. All mismatches are destabilized by a significant amount relative to it, with significantly lower melting temperatures. The C·C mispair is the least stable, with a depression in Tm of 17.3 ºC. The destabilizing effect of mismatch base pairs on DNA duplexes is illustrated clearly by this type of UV melting experiment.
Shape and stability of base pairs
Mismatch base pairs are generally different in shape from Watson-Crick base pairs and are usually thermodynamically unstable. The two properties (shape and stability) are related because "mis-shapen" base pairs are unlikely to form stable base-stacking interactions within the DNA double helix. In some cases, hydrogen bonding of the heteroatoms to the surrounding water molecules is also inhibited, thus further destabilizing the base pair. It is likely that DNA repair enzymes utilize all of these factors in mismatch recognition.
Chemical damage to DNA bases
Apart from being the secret of life, DNA is a chemical molecule and, as with other molecules, the phosphates, sugars and heterocyclic bases of DNA are susceptible to modification by reagents including chemical carcinogens, ionizing radiation and ultraviolet light.
Some important examples of changes that can occur to DNA bases are the deamination of cytosine, the methylation of guanine by alkylnitrosoureas, and the oxidation of the 8-position of adenine or guanine by hydroxyl radicals generated by oxidizing agents or γ-radiation.
Deamination of cytosine
Cytosine is susceptible to hydrolytic deamination to uracil (Figure 7).
The conversion of cytosine to uracil gives rise to transition mutations if left uncorrected. Uracil is a foreign base in DNA, and is converted back to cytosine by a specific enzyme, uracil DNA glycosylase (UDG).
Methylation of guanine
O(6)-Methylguanine can be formed when DNA is exposed to alkylnitrosoureas, such as N-nitrosodimethylamine (NDMA) and N-methyl-N-nitrosourea (the major product of the reaction is N(7)-methylguanine, which is not mutagenic, because the modification does not perturb base pairing; Figure 8).
The presence of O(6)-methylguanine in DNA can be very damaging because methylation at the O(6)-position changes the hydrogen bonding properties of guanine, thereby inducing G to A transition mutations (Figure 9). This indicates that the O(6)-MeG·T mispair is selected during replication in addition to (or in preference to) the O(6)-MeG·C pair.
The O(6)-MeG·T pair and the O(6)-MeG·C pair have very similar thermodynamic stabilities, and the O(6)-MeG·C pair adopts a wobble configuration at physiological pH. Thus, the proofreading domain of DNA polymerases forces the modified guanine base to accept thymine as a partner instead of cytosine and the enzyme allows a lesion to pass by uncorrected.
Specific methyltransferase enzymes have evolved to demethylate methylguanine before mutations become incorporated into the genome.
Oxidation of adenine and guanine
The reaction of the purine bases of DNA with hydroxyl radicals can result in the formation of 8-oxoadenine (O8A) and 8-oxoguanine (O8G) (Figure 10). These bases exist predominately in the 8-keto form, and their contribution to mutagenesis is the subject of much interest.
Modification at the 8-position does not directly affect the ability of adenine and guanine to form Watson-Crick base pairs, but the presence of the bulky oxygen atom increases their tendency to adopt the syn conformation, thereby providing possibilities for base mispairing. The presence of an O8G base in genomic DNA can lead to a G to T tranversion mutation via an O8G·A base pair with a syn:anti configuration stabilized by two interbase hydrogen bonds (Figure 11). In addition to having reasonable thermodynamic stability, the O8G·A pair is pseudosymmetric about the N-glydosidic bonds which join the bases to the sugars, and is therefore structurally similar to a Watson-Crick base pair. The similarity is particularly striking in the minor groove, where the 8-oxygen atom of O8G lies in the position that would be occupied by the 2-oxygen atom of the thymine base in an A·T base pair. Thus, the O8G·A base pair is not readily recognized by proofreading enzymes.
In contrast to O8G, O8A (8-hydroxyadenine) is not strongly mutagenic: the modified adenine base retains a strong preference for thymine as a partner. This preference may be largely structural in origin, as the most thermodynamically stable mispair, O8A·G, is asymmetric and is similar in structure to a purine-pyrimidine mismatch; O8A·G is therefore likely to be an easy target for repair enzymes. The G (syn) O8A (anti) base pair (Figure 12) is interesting because it appears to be held together by four three-centered hydrogen bonds (sometimes called bifurcated hydrogen bonds; Figure 12). This arrangement is stable because it allows the 2-amino group of guanine to fulfil its hydrogen bonding capacity by interacting with the oxygen atom of O8A as well as with a neighbouring water molecule (not shown in Figure 12). As mentioned previously, any form of base-pairing that prevents the guanine 2-amino group from fulfilling its hydrogen bonding potential, with either the opposing base or neighbouring water molecules, will tend to be unstable relative to the individual unpaired bases.
8-Oxoadenine and 8-oxoguanine bases are repaired in base excision repair pathways.
Formation of purine 5',8-cyclo-2'-deoxynucleosides
The reaction of purine bases with hydroxyl radicals can also result in the formation of the 5',8-cyclo-2'-deoxynucleosides 5',8-cyclo-2'-deoxyadenosine and 5',8-cyclo-2'-deoxyguanosine (Figure 13).
Unlike 8-oxoadenine and 8-oxoguanine, the 5',8-cyclo-2'-deoxynucleosides cannot be repaired by base excision repair, and must be repaired by nucleotide excision repair. The presence of C5'-C8 covalent bond is thought to make the N-glycosidic bond more stable, preventing its hydrolysis by DNA glycosylase.
Formation of T-T dimers
In the presence of UV light (e.g. from sunlight), T-T dimers can form. The main T-T dimer observed is the cyclobutane pyrimidine dimer (CPD, ~80%), and a pyrimidine-pyrimidone dimer (6-4 photoproduct, 6-4PP) also forms.
Both T-T dimer lesions can lead to mutagenesis and skin cancer, but are repaired by enzymes called photolyases.
DNA repair mechanisms
There are several distinct mechanisms of DNA repair that repair different types of DNA damage:
- Direct reversal of damage without affecting the phosphate-sugar backbone
- Base excision repair
- Nucleotide excision repair
- Mismatch repair
- Other methods including homologous recombination repair and non-homologous end joining
Direct reversal, typically used for the repair of alkylated DNA, involves many unrelated enzymatic pathways, each of which has evolved to repair a specific DNA lesion. An example is demethylation of O(6)-methylguanine by O(6)-methylguanine-DNA methyltransferase (MGMT), a cysteine methyltransferase. The enzyme accepts the methyl group from O(6)-methylguanine nucleotides stoichiometrically, and is inactivated in the process (Figure 15).
Many enzymes involved in direct reversal are, like MGMT, suicide enzymes: each enzyme molecule is only able to catalyze a single reaction, and is destroyed afterwards. This is an expensive process for the organism, and is reserved for the repair of highly mutagenic or cytotoxic DNA lesions. The existence of these enzymes is an indication that the DNA of living organisms has been regularly exposed to naturally occurring alkylating agents on an evolutionary timescale.
Base excision repair
In base excision repair, damaged heterocyclic DNA bases are detected and excised by DNA glycosylase enzymes. DNA glycosylases detect damaged bases in double-stranded DNA and excise them by hydrolysing the N-glycosidic bond. Different glycosylases target different types of DNA base damage. Figure 16 shows the mechanism of hydrolysis of the N-glycosidic bond of uridine nucleotides (uracil arises in DNA through deamination of cytosine) catalyzed by uracil DNA glycosylase (UDG). DNA glycosylases including UDG employ base flipping to enable them to reach the target base within the double helix.
Following excision of the damaged base, an abasic site (or AP site) is left in one strand of the double-stranded DNA. The abasic site is repaired in a series of steps by endonucleases, polymerases and ligases, using the complementary DNA strand as a template. Figure 17 shows an example of a base excision repair pathway.
Nucleotide excision repair
Nucleotide excision repair is a more versatile excision repair mechanism: unlike base excision repair and direct reversal, which require specific enzymes for different types of damage, nucleotide excision repair senses distortion of the overall structure of the DNA double helix, and is therefore more general for different types of DNA damage.
As well as a distorted helix structre, nucleotide excision repair requires that there is a chemical modification of the DNA. Proteins detect and bind to damaged DNA that satisfies these criteria, making the distortion in the structure around the site of damage even more pronounced. Once the lesion has been recognized in this way, a denaturation bubble (of about 30 nucleotides) opens up around the lesion. A series of protein complexes forms, an endonuclease nicks the damaged strand at two sites spanning the lesion, and a short oligonucleotide shapping the lesion is excised. The gap is filled by DNA polymerases, and the nick is sealed by DNA ligases.
Distinguishing parent and daughter DNA strands: DNA methylation
With mutations in hetereocyclic bases, the damaged bases are detected and repaired whichever strand in the double helix they reside on * there is no ambiguity which strand should be repaired. In mismatch repair, the damaged base pair (not the base) is the unit that is detected. The daughter strand must be identified before repair can proceed.
How do DNA repair enzymes know which is the parent strand and which is the daughter strand? In genomic DNA, many of the cytosine bases are methylated at the 5-position (Figure 19). This DNA methylation is an epigenetic modification that controls the silencing of genes that are not needed in a particular cell.
Immediately after DNA replication, only the parent strand is methylated, while the newly synthesized daugher strand is unmethylated. Repair enzymes recognize this methylation and make changes to the daughter strand only. When the proofreading and repair enzymes have checked that the daughter strand is a faithful reverse complementary copy of the parent, methylation of the daughter strand is carried out by another enzyme (a maintenance DNA methyltransferase). For further information on DNA methylation and its central role in epigenetics, see the chapter on epigenetics.
The presence of a methyl mark is equivalent to making the statement at the molecular level that the new DNA strand has been fully checked and is acceptable.
Mechanism of mismatch repair
Once the mismatch has been recognized, and the daughter strand has been identified, the repair process begins. An endonuclease nicks the unmethylated daughter strand at two positions either site of the mismatch. An exonuclease then digests the short oligonucleotide spanning the mismatch site, leaving a gap in the damaged strand. A DNA polymerase synthesizes a new stretch of DNA to fill the gap left by the excised oligonucleotide, using the parent strand as a template. Finally, a ligase seals the nick, completing the repair of the DNA duplex (Figure 21).
Other DNA repair mechanisms
Double-strand breaks are repaired by non-homologous end joining, in which a DNA ligase joins the severed ends, or homologous recombination, in which sections from the damaged DNA are exchanged with an undamaged duplex. Lesions that cannot be repaired by other methods, such as those spanning both strands of the duplex (as formed with interstrand cross-linking agents), are also be repaired by homologous recombination.
T-T dimers are repaired by a specific class of enzymes called photolyases, which are present in many organisms. Separate photolyases (CPD photolyases and (6-4) photolyases) effectively reverse the dimer formation. The multi-step process involves the absorption of visible light and multiple electron transfer steps.
When damaged DNA is not repaired
Sometimes, this is not enough: despite all the DNA repair pathways that have evolved, sometimes a mutation will survive uncorrected through to the next round of DNA replication. At this point, (depending on the nature of the mutation) it may be permanently incorporated into the genome.
When DNA directs the synthesis of a protein from a gene containing a single mutation (through transcription and translation), the mutant protein may contain one incorrect amino acid. This may be of no great consequence, but if the amino acid is crucial (e.g. a catalytic residue at the active site of an enzyme) the resultant protein may be inactive. The malfunction may result in a catastrophic event such as the death of the organism, a genetic disease such as cystic fibrosis or sickle cell anaemia, or a carcinogenic lesion.
Occasionally, a mutation in DNA will produce a protein that works better than the original, or perhaps performs a new function: the mutant organism will have a competitive advantage over its neighbours, enabling it to spread its new gene through the population.
Evolution is not just something that occurred millions of years ago: it can be seen today in antibiotic resistance. Since the introduction of antibiotics into clinical use in the early 1940s, there has been a strong selective pressure for bacteria to evolve resistance. Strains of bacteria resistant to penicillin were observed in the mid-1940s, and there is a narrow window when any new antibiotic is introduced today before pathogenic bacteria acquire resistance, leading to a never-ending quest to find new antibiotics.
Without errors in DNA replication, evolution cannot occur; and without evolution, we would all be swimming around in a primordial soup. However, too many errors can lead to problems including cancer. A fine balance has evolved to allow just enough genetic variation for evolution to occur.