Molecular markers offer numerous advantages over conventional phenotypic alternatives as they: (1) are stable and detectable in all tissues regardless of growth, differentiation, development, or defense status of the plant cell, (2) are unaffected by the environment, and (3) generally lack pleiotropic and epistatic effects. Genetic variation detected by these markers can be used to manipulate traits, map and isolate genetic loci and evaluate genetic diversity for example in germplasm collections and wild populations.
Of molecular markers, those that target nucleic acids are by far the most powerful. Nucleic acid markers are generated by "profiling" or "fingerprinting" techniques capable of sampling the "information-rich" nucleic acid molecules (Weising et al. 1995, Caetano-Anollés and Gresshoff 1997). The sampling strategies are designed to reduce the genetic information contained in the typically 1-10,000 million base pairs (Mb) of a genome by restricting analysis to selected nucleic acid regions. These regions currently represent 1-10,000 bp and act as "depictors" of nucleic acid sequence composition, providing efficient estimators of relatedness, phylogeny and inheritance of genetic material. However, while the varying levels of genetic complexity contained in a nucleic acid profile permit the efficient screening of variant nucleic acid sequences from closely or distantly related organisms, nucleic acid markers should always be regarded a short-cut to the retrieval of extensive sequence information.
Nucleic acid marker techniques target single or multiple loci, and generally use nucleic acid hybridization (Southern 1975), an hydrogen-bonding interaction between two nucleic acid strands that often follows the Watson-Crick complementary rules, or an enzymatic oligonucleotide-driven amplification reaction that is capable of accumulating copies of specific nucleic acid sequences (Landegren 1993). Following a recent classification (Karp and Edwards 1997), representative techniques grouped into four categories (see Table). An important distinction needs to be made on nomenclature. Nucleic acid "fingerprinting" usually relates to those applications where there is concurrent detection of multiple loci without assignment of a genotype. Since information on loci and alleles is unavailable, generated fingerprints constitute genetic phenotypes. In contrast, "profiling" often relates to those applications where single loci are studied and a genotype is inferred. The following are the different technique groups proposed:
(1) Hybridization-based techniques: This category includes those techniques that use labeled nucleic acid molecules as hybridization probes. Probe molecules range from synthetic oligonucleotides to cloned DNA. Generally, nucleic acids are digested with restriction endonucleases and the resulting fragments are separated by electrophoresis and sometimes transferred to membranes by Southern blotting. Restriction fragments are then hybridized to probes representing single copy genomic segments (RFLP analysis) or repeated sequences such as minisatellites (VNTR fingerprinting) or microsatellites (oligonucleotide fingerprinting). Resultant profiles target single or multiple loci. Though conventional, RFLP analysis provides very reliable co-dominant markers, especially for linkage analysis and breeding (Tanksley et al. 1989), and have been used extensively in the plant sciences.
(2) Amplification-based nucleic acid scanning techniques: These are fingerprinting techniques that use an in vitro enzymatic reaction to specifically amplify a multiplicity of target sites in one or more nucleic acid molecules (Caetano-Anollés 1996, Micheli and Bova 1996). The amplification reaction is generally driven by short synthetic oligonucleotides of arbitrary or semi-arbitrary sequence that produce a collection of amplified products of largely non-allelic nature. Generated amplification fingerprints can be DNA or RNA-based, depending on the original intended target, but their generation does not require prior knowledge of nucleotide sequence or availability of cloned and characterized hybridization probes. A number of DNA fingerprinting alternatives can amplify anonymous sites in a genome (RAPD, DAF, AP-PCR), cloned DNA (mhpDAF) or nucleic acid amplification profile (ASAP), or abundant motifs either artificially introduced in a template (AFLP) or ubiquitous in a genome (e.g., MP-PCR, rep-PCR, Alu-PCR). Generally, amplification products are separated using gel electrophoresis. However, products can be analyzed by hybridization using arrays of oligonucleotides of arbitrary sequence (scanning-by-hybridization; e.g., NASBH) or probes specific to microsatellites (e.g., RAMPO, RAMS) or other selected sequences (dot-blot hybridization). A similarly prolific number of RNA fingerprinting approaches have been used mainly in the study of gene expresion. RNA is first reversed transcribed and complementary DNA strands fingerprinted with arbitrary primers. Here, DD-PCR and RAP-PCR have been very popular in the detection and cloning of differentially expressed transcripts, but the need to overcome the relative low abundance of some of RNA species (Bertioli et al 1994) has prompted the design of a number of improvements on the strategy [e.g. gene expression fingerprinting (GEF); Ivanova et al. 1995]. In order to avoid competition artifacts (context effect) that can compromise the robustness of the amplification reaction, the current tendency in nucleic acid scanning is to increase fingerprint complexity by targeting individual motifs or by using high primer-to-template ratios (e.g. AFLP, DAF, cDNA-AFLP).
(3) Amplification-based nucleic acid profiling techniques: A number of techniques based on the polymerase chain reaction (PCR) have targeted unique sequences in organelle and nuclear genomes. For example, the analysis of ribosomal RNA genes and mitochondrial and chloroplast DNA by amplification and sequencing has allowed a reemphasis on the phylogenetic perpective in biology (Hillis 1997). While targeting stretches of unique sequence, PCR can also amplify repetitive DNA in the form of tandemly repeated sequences such as minisatellite and microsatellite loci, and transposable elements. For example, primers designed to flank simple sequence repeat (SSR) regions can produce sequence-tagged microsatellite sites (STMS). These powerful SSR markers define single highly variable loci which express high allele number and are especially suitable for genetic mapping (Beckmann and Soller 1990) and the study of genetic diversity in applications related to molecular ecology, systematics and population biology (Powell et al. 1996). However, their initial development is quite demanding.
(4) Sequence-targeted techniques: Genetic variation occurs preponderantly at the single nucleotide level. For example, about 80% of human sequence variation is confined to single nucleotide polymorphisms (SNPs). Direct sequencing can directly identify such differences that usually depend on how closely related are the organisms being compared and occur at levels ranging 0.54 SNP.kb-1. However, a number of popular techniques can type SNPs very efficiently. The most widely employed are allele-specific oligonucleotide (ASO) (Saiki et al. 1986) and allele-specific reverse dot-blot (Keller et al. 1991) hybridization, techniques that exploit differential thermal stabilities during the hybridization of an oligonucleotide probe to its target. Similarly, the use of fluorogenic reporter probes (Livak et al. 1995) that take advantage of 5'-to-3' nucleolytic activity of thermostable DNA polymerases (Holland et al. 1991) now promise real-time PCR quantitation and an efficient diagnostic tool (TaqMan ASO). Other techniques include single-strand conformation polymorphism (SSCP) analysis during gel electrophoresis or those that use primer mismatching for allele discrimination (e.g., ARMS). To address some limitations imposed by gel-based methodologies, a number of solid-phase genotyping approaches have been introduced, including OLA, CAL and GBA. Finally, within the solid-phase sequence analysis techniques are those that use nucleic acid arrays (Southern 1996). Oligonucleotide arrays ("chips") confine individual oligonucleotides to defined physical addresses in a solid support (nylon, glass, silicon, polypropylene, etc.) by using solid-phase oligonucleotide synthesis, light-directed chemical synthesis using photolithographic masks, or accurate fluid microdispensing. In sequence-by-hybridization (SBH), arrays of short oligonucleotides, usually octamers, hybridize to overlapping complementary sequences present in the target nucleic acid molecule and can sequence short stretches of DNA of about 100 nucleotides in length (Strezoska et al. 1991, Drmanac et al. 1993). The method can also be used in genotyping applications. Gridded oligonucleotide arrays probe SNPs through multiple pairwise comparisons (Maskos and Southern 1993; Yershov et al. 1996), and query extended nucleic acid sequences (e.g., 16.6 kb human mitochondrial genome; Chee et al. 1996) and thousands of mRNA species in parallel (e.g., Lockhart et al. 1996).
Selecting a suitable marker system depends on a number of factors (Karp and Edwards 1997). Technical considerations include throughput and speed, equipment and skills required, the need for automation, and cost effectiveness. Other considerations relate to the technique itself, the informativeness and sensitivity of the marker system, and its overall reliability. Finally, the individual application and its demands on accuracy and data-analysis must also be considered.
Generally, bands or hybridization signals can be converted into measurements of similarity or dissimilarity which in turn depict "genetic distances" that portray sequence divergence between organisms in phenetic and cladistic analysis. Alternatively, allele frequencies can measure diversity in natural populations and can trace characters in inheritance studies. When markers are used in genetic mapping or trait-tagging, polymorphism and co-dominance are important issues. Here, desirable markers are highly-polymorphic and exhibit multiple co-dominant alleles (e.g., SSR markers). However, dominant markers generated by robust multiple-locus fingerprinting techniques (e.g., AFLP, DAF, oligonucleotide fingerprinting) can be used efficiently despite their low information content per locus. When markers are used to estimate genetic diversity and build molecular phylogenies, the taxonomical level of analysis becomes of great importance. Amplification-based scanning techniques are here useful for distinguishing individuals and closely related organisms, usually below the species level (cultivars, accessions, lines, clones, etc.), whereas the presumed neutrality and high allele number of SSR markers makes them superior to isoenzymes and ideally suited for the study of populations. In contrast, RFLP, direct sequencing and amplification-based profiling techniques are useful for cross species analysis and phylogenetic reconstruction. Return