Gene synthesis



Artificial gene synthesis is the chemical synthesis of a DNA sequence that represents one or more genes. While site-directed mutagenesis is regularly used to probe biological hypotheses by incorporating single base mutations, artificial gene synthesis provides a method to efficiently produce long stretches of natural and non-natural nucleic acid sequences, broadening the scope of biological experiments. Sequences that are hard to isolate from natural sources can be routinely generated in the lab, even entirely non-natural gene sequences can be synthesised. Not only does gene synthesis have implications for synthetic biology, it also introduces the possibility of developing genes containing modified nucleotides or even novel base pairs that could allow for the expansion of the genetic code.

Therefore, large-scale low-cost methods of gene synthesis must be readily available to facilitate relevant academic and commercial biological research.

Artificial gene synthesis

An initial challenge of gene synthesis was the production of long stretches of nucleotide sequence: genes can range from several hundred to several thousand base pairs in length. Current solid-phase oligonucleotide synthesis technology is limited to the generation of oligos 200 nucleotides in length. Two major methods exist to circumvent this issue: polymerase cycling assembly and synthesis by ligation.

Polymerase cycling assembly (PCA)

Polymerase cycling assembly (PCA) exploits single stranded templates rather than relying on total synthesis of both DNA strands. Short oligos are synthesised that alternate between both strands of the gene, overlapping by regions of 20-30 base pairs (Figure 1). Then, to give a superficial explanation, a polymerase uses dNTPs to "fill in the gaps" on one gene strand using the complementary bases on the other. A subsequent, entirely separate polymerase step then amplifies the constructed gene by traditional polymerase chain reaction (PCR).

Gene synthesis by polymerase cycling assembly (PCA)

Figure 1 | Gene synthesis by polymerase cycling assembly (PCA)Oligos that tile the entire gene are synthesised and mixed together. Complementary strands overlap at the termini and are assembled by polymerase action until the whole gene is synthesised. Amplification PCR increases the concentration of the gene product, errors from solid-phase synthesis and polymerase action are corrected enzymatically and the gene is reassembled and re-amplified. The correct gene strands are then purified from residual incorrect sequences by insertion into bacterial plasmids and cloning in E. coli.

With PCA, the commonly used explanation of the polymerase "filling in the gaps" is misleading: the gene is actually assembled step by step. Overlapping oligos anneal at their termini to form discrete overhanging dimers (see Figure 2). The overhangs act as templates to elongate the complementary oligo by polymerase action generating fully double-stranded DNA that makes up a small section of the gene. These extended dsDNA sections are then melted and annealed to another overlapping oligo further along the gene, forming more single stranded overhangs. Further polymerase action turns these dimers into longer dsDNA fragments and this process continues until the entire gene is assembled.

Gene synthesis by PCA − 4 oligos

Figure 2 | Gene synthesis by PCA − 4 oligos(a) The oligos used for PCA must tile the entire gene and overlap by 20-30 base pairs. The first and last oligos must be 5ʹ primer oligos (red and blue). (b) Round 1: The oligo mix is melted to yield single stranded oligos. The mixture is then cooled to allow annealing of oligo dimers at their termini (all possible products are shown). Only those with the potential to extend 5ʹ to 3ʹ are extended in the final extension step (direction of polymerase action). Round 2: Further iterations of melting, reannealing and extension occur generating the first appearance of the gene product. Further rounds of PCA generate more gene product until the primers are consumed.

As shown in Figure 2, many combinations of dimers can anneal and not all result in successful extension. For examples of any more than 4 oligos, the various iterations of annealing and elongation get too complicated to represent diagrammatically. One consistent point however is that polymerases extend 5ʹ to 3ʹ only, so although many oligo dimers will overlap and anneal, only those with the potential to extend 5ʹ to 3ʹ will be elongated (Figure 2). This limitation implies that the terminal oligos that cover the ends of the gene must be on the gene strand with the 5ʹ terminus. Otherwise, if a terminal oligo were on the strand with the 3ʹ terminus, the complementary oligo would have to be extended 3ʹ to 5ʹ to complete the gene, which is impractical for standard polymerase methodologies.

PCA eliminates the need for the whole gene to be synthesised solid phase. Once assembled, the gene product can be amplified by traditional PCR to generate a suitable concentration and purified by filtration to remove any remaining primers.

Error correction

Unfortunately, the process is not without errors. Insertion and deletion mutations (frameshift mutations) can be introduced at the solid phase synthesis stage and chain extension by polymerase risks further substitutions, insertions and deletions. Fortunately, natural DNA repair enzymes exist to correct these errors. However, since synthetic DNA has no indication of which strand has the correct sequence (unlike DNA in vivo), nucleases that rely on positive identification of the correct template strand cannot be used. Instead, it is necessary to use endonucleases that completely excise stretches of DNA that contain a mutation (Figure 3).

Since PCR cannot discriminate between correct sequences and those with errors, any mutations that have accumulated in the solid phase synthesis or polymerase extension of one gene strand will be matched in the reverse complement during amplification: the mutation is "invisible". Therefore, the mixture of correct and incorrect gene products is melted and reannealed. It is statistically unlikely that mutations will occur in the same place in the same way on two random strands. As such, random reannealing to a strand which likely doesn't have the complementary mutation generates a mismatch as shown in Figure 3. This mismatch perturbs the secondary structure of the gene.

Certain mismatch repair enzymes such as T7 Endonuclease I recognise these structural perturbations and cleave the first, second or third phosphodiester bond 3' to the mismatch (dependent on the specific enzyme) on both strands. A single strand exonuclease then chews up the overhanging single stranded DNA 3' to 5'. This process produces dsDNA of varying sizes that all overlap and between them contain the correct sequence spanning the entire gene. Subsequent PCA and PCR amplification generates the corrected gene product.

Error correction

Figure 3 | Error correctionRandom errors present in dsDNA sequence will present as mismatches when melted and reannealed to error-free complements. Mismatch-cleaving enzymes recognise the perturbation in secondary structure these mismatches cause and cleave phosphodiester bonds 3ʹ to the mismatch on both strands. The resulting single stranded overhangs are removed by an exonuclease and PCA is carried out to re-assemble the various sized fragments into the full gene product. Figure adapted from Trends Biotechnol. 30, 147-154 (2012).

Gene synthesis by ligation

While PCA facilitates the synthesis of large genes and genomes, even after error correction the product often has many errors present in the sequence. Gene synthesis by ligation (Figure 4) has a lower error rate than PCA, but is limited to the synthesis of smaller genes since the fragment oligos must span the entire sequence of both the sense and antisense strands: this is impractical for longer genes.

However, if the goal is to synthesise a smaller gene (< 2 kbp), gene synthesis by ligation will likely generate a product with fewer errors because:

  • It removes the need for a polymerase that can often introduce mutations;
  • Strands are subject to hybridisation selection − since all oligos overlap entirely, errors in their syntheses will produce mismatches which mean they are less likely to anneal and form a ligated product.
Gene synthesis by ligation

Figure 4 | Gene synthesis by ligationOligos that span the entire gene are ligated together to form the gene product. Complete complementarity yields a lower error rate than PCA but ligation-based synthesis is limited by size.

Solid phase synthesis generates the constituent fragments which are then purified ready for ligation. These oligos span the entire gene on both strands, missing only the phosphodiester bonds that link neighbouring oligos. Fragments are designed so two adjacent oligos are held together with a complementary template (the splint) then ligated enzymatically.

Three major methods of ligation-based synthesis exist (Figure 5):

  • shotgun ligation − the gene is synthesised en masse by simply mixing the constituent oligos together and adding a ligase. The reaction can occur at room temperature though it usually requires extended reaction times. Use of thermostable ligases and high purity oligos have now made this one-pot reaction more efficient.
  • ligase chain reaction (LCR) − LCR originally superseded shotgun ligation as a more efficient method with shorter reaction times and the ability to ligate longer genes (still limited to 2 kbp). This method follows a similar set of melting and annealing steps to PCR though differs by using a ligase instead of a polymerase to ligate templated oligos until the entire gene is assembled.
  • solid-support based ligation-mediated oligonucleotide assembly − The initial dsDNA oligo is immobilised on a solid support and iterative rounds of annealing and ligation steadily assemble the entire gene, which is cleaved from the support at the end of the synthesis.
Three methods of ligation-based gene synthesis

Figure 5 | Three methods of ligation-based gene synthesis(a) Shotgun Ligation − the gene is synthesised en masse by mixing the constituent oligos together and adding a ligase; (b) Ligase chain reaction (LCR) − iterative steps of melting, annealing and ligating under thermocycling conditions allow for a more efficient ligation of discrete dimers which continues until the entire gene is assembled; (c) Solid-support based ligation-mediated oligonucleotide assembly − The initial dsDNA oligo is immobilised on a solid support and adjacent strands are added, ligated and washed away. This continues until the entire gene is assembled, which is then cleaved from the support at the end of the synthesis.

While the error rate of ligation based synthesis is lower, it may still be necessary to perform error correction to yield a legitimate number of viable genes. Furthermore, PCA is the dominant methodology used for artificial gene synthesis. Ligation-based techniques require polyacrylamide gel electrophoresis (PAGE) purification and 5' phosphorylation, making the whole method costly and labour intensive. PCA and associated error correction is preferred as these steps are eliminated.

Further methods of gene synthesis exist including assembly directly on to the plasmid relying on bacterial homologous recombination (Sequence- and Ligation- Independent Cloning − SLIC) or in vivo assembly of transformed oligonucleotides by the yeast Saccharomyces cerevisiae. PCA and ligation based methods remain by far the most common methods of gene synthesis.

Gene purification by cloning

Once the gene has been synthesised, the resulting solution will contain a mixture of gene strands with correct and incorrect sequences. To purify this mixture, it is necessary to separate out each individual strand and amplify it. This would be a difficult task for traditional DNA purification techniques: the difference between a correct and incorrect sequence may only be a single nucleotide, and separation by length would not discriminate between substitution mutations.

To amplify each strand separately, one can employ the help of E. coli bacteria:

  • Separate genes from the mixture are incorporated into individual E. coli cells (see below).
  • The cells containing these genes can be cultivated to produce a set of colonies, each one having been grown from a single cell containing only one of the original gene strands. Each cell in the colony is therefore a "clone" of the original bacterium.
  • All the colonies together constitute a "clone library", each colony acting as a store of a different pure gene strand from the original mixture.
  • Colonies can then be analysed to determine which contain the correct gene sequence.

Insertion of the gene product into plasmids

First, the gene strands must be inserted into a vector that allows movement of the gene into the bacterial cell. Plasmids are small circular pieces of DNA found in bacteria that are ancillary to the main bacterial chromosome. They are readily transferred from cell to cell and from environment to cell and undergo replication within the cell. Therefore, plasmids are a useful vector into which the gene product may be inserted for in vivo amplification. Insertion is achieved by one of two major methods.

Cohesive-ended ligation

Cohesive-ended or "sticky-ended" ligation involves digestion of the plasmid with a restriction endonuclease that linearises the circular DNA by cutting both strands unevenly, leaving single stranded 3' overhangs at each end. The PCR product is also designed with flanking restriction endonuclease sites identical to the plasmid: these are also digested by the same enzyme. The resulting sticky ends are complementary to those of the plasmid so the PCR product and plasmid can anneal and the nicks are ligated by a ligase or other phosphodiester bond-forming enzyme (Figure 6).

Cohesive-ended ligation

Figure 6 | Cohesive-ended ligationBoth the plasmid and PCR product are cut at a recognition sequence by a restriction endonuclease (in this case, EcoR1: recognition sequence CTTAAG). The PCR product must be engineered with these restriction sites in-built. The PCR product and plasmid overlap and anneal and the nicks are closed by a ligase.

Whichever way round the PCR product is inserted, the plasmid is usually engineered to have a promoter for transcription upstream (5ʹ) to the insertion site on both strands. So, if the gene is intended to produce a functional nucleic acid or protein product, this will be expressed by the bacteria once transformed. As a result, directionality of insertion into the plasmid is largely unimportant; however unidirectional forms of ligation do exist if required.

Blunt-ended ligation

Blunt-ended ligation involves the digestion of the plasmid with a restriction endonuclease that linearises the circular DNA by cutting both strands evenly, leaving no overhangs at either end. The PCR product is produced by a polymerase that leaves the ends similarly blunt, and both the plasmid and the gene are combined and ligated (Figure 7).

Blunt-ended ligation

Figure 7 | Blunt-ended ligationThe plasmid is cut at a recognition sequence by a blunt-end forming restriction endonuclease. The PCR product need not undergo any transformation since it is already blunt-ended. The PCR product and plasmid associate and the nicks are closed by a ligase.

Blunt-ended ligation is less efficient than cohesive-ended ligation due to the lack of sticky-ends that serve as templates for ligation; however it eliminates the post-PCR digestion step required for cohesive-ended ligation. It is possible to favour the forward blunt-ended ligation reaction by varying the reaction conditions, especially the concentration of the reactants.


Once the plasmid with the insert has been prepared, it is combined with a suspension of E. coli and the cells are heat-shocked to promote the transformation of the plasmids into the cells:

  • The suspension of E. coli and plasmid is cooled on ice for 30 minutes;
  • It is then rapidly heated to 42 °C for up to 45 seconds;
  • The suspension is returned to ice for 2 minutes.

While transformation is a natural process for many bacteria, heat-shock treatment artificially enhances its efficiency. Abrupt temperature changes (in the presence of calcium ions) promote the formation of pores on the cell surface through which supercoiled plasmid DNA can pass. This process also depolarises the inner bacterial membrane, allowing the negatively charged plasmid to traverse the normally negatively charged membrane and enter the cell.

One problem is the potential for a bacterium to take up more than one plasmid: the resulting colony would not be a clone of one single gene strand. Cells that have been modified for efficient transformation (competent cells) still maintain a low enough efficiency such that most competent cells will only take up a single plasmid. However, a small fraction of these competent cells will readily take up two or more plasmids (hypercompetent cells).

Each plasmid contains an origin of replication (ORI) which triggers the replication of the plasmid by cell machinery and regulates how much of the plasmid should be present per cell. If two plasmids with the same ORI enter the cell, regulatory pathways get confused and one plasmid is selected for over the other. Therefore, since each of the gene strands were inserted into plasmids with identical ORIs, if two or more plasmids are taken up by the cell one is likely to be randomly outcompeted by the other. Rarely, two plasmids with different gene strands and the same ORI can persist in the same cell line. It is therefore crucial to keep the plasmid concentration as low as possible − this promotes single-plasmid transformation.

Clone library production

The suspension of transformed E. coli is cultured on agar overnight. Since only cells containing plasmids with a gene-insert are of interest, the plasmid is engineered to contain sequences that positively select for this scenario. An antibiotic resistance gene is incorporated into the plasmid sequence so only cells containing this resistance gene will survive on agar enriched with the relevant antibiotic. Therefore untransformed E. coli and airborne bacterial contaminants should not grow on the plate.

Furthermore, it is possible for cut plasmids to join back up without a gene insert (non-recombinant plasmids). To ensure these non-recombinant transformants do not grow, the plasmid will have a negative selection gene in place so that if the cut plasmid ends join together without an insert, they will form a gene which is fatal to the bacteria. For example, many plasmids contain a ccdB gene: once transcribed and translated, the cytotoxic polypeptide CcdB inactivates bacterial DNA gyrase and kills the cell. Only when the ccdB gene is interrupted by the insert (forming a recombinant plasmid) will the bacteria grow and form a colony.

Once grown, if the phenotype that the gene insert codes for is quantifiable at the colony level (i.e. production of a fluorescent protein, further antibiotic resistance) it may be possible to confirm whether the gene is error-free without the need for sequencing. Usually however, this is not the case, and colonies must by lysed, the plasmid extracted and the gene amplified from the plasmid for Sanger sequencing. Once confirmed, the correct genes can then be amplified by PCR to generate the pure gene product.


Bernard, P.; Positive Selection of Recombinant DNA by CcdB. BioTechniques 21, 320-323 (1996).

Blackburn, G.M.; Nucleic Acids in Chemistry and Biology; ISBN: 9780854046546; 2006, RSC Pub.

Goldsmith, m.; Kiss, C.; Bradbury, A. R. M.; Tawfik, D. S.; Avoiding and controlling double transformation artifacts. Protein Eng. Des. Sel. Methods 20, 315– 318 (2007)

Kosuri, S.; Church, G M.; Large-scale de novo DNA synthesis: technologies and applications. Nat. Methods 11, 499–507 (2014).

Ma, S.; Saaem, I.; Tian, J.; Error Correction in Gene Synthesis Technology. Trends Biotechnol. 30, 147-154 (2012).

Panja, S.; Aich, P.; Jana, B.; Basu, T.; How does plasmid DNA penetrate cell membranes in artificial transformation process of Escherichia coli? Mol. Membr. Biol. 25, 411-422 (2008).

Voigt, C.; Synthetic Biology, Part B: Computer Aided Design and DNA Assembly; Volume 498 of Methods in Enzymology; ISBN: 9780123851215; 2011, Academic Press.

Weston, A.; Humphreys, G. O.; Brown, M. G. M; J. R. Saunders.; Simultaneous transformation of Escherichia coli by pairs of compatible and incompatible plasmid DNA molecules. MGG 172, 113-118 (1979).

Young, L.; Dong, Q.; Two‐step total gene synthesis method. Nucl. Acids Res. 32, e59 (2004).

See also

Our free online Nucleic Acids Book contains information on all aspects of nucleic acids chemistry and biology.