Non-Coding DNA and Nature’s Preoccupation With Complementarity and Contrariety
Majid Ali, M.D.
Nature’s Preoccupation With Complementarity and Contrariety Is the Title of the First Volume of my 14-volume Text Book The Principles and Practice of Integrative Medicine
Below is text from an article published in the journal Nature (2009;461:199) which sheds light on the complementarity and contrariety in human cell systems and its impact on health and disease:
“In contrast to protein-coding sequences, the significance of variation in non-coding DNA in human disease has been minimally explored. A great number of recent genome-wide association studies suggest that non-coding variation is a significant risk factor for common disorders, but the mechanisms by which this variation contributes to disease remain largely obscure. Distant-acting transcriptional enhancers — a major category of functional non-coding DNA — are involved in many developmental and disease-relevant processes. Genome-wide approaches to their discovery and functional characterization are now available and provide a growing knowledge base for the systematic exploration of their role in human biology and disease susceptibility.”
Multiple lines of evidence indicate that important functional properties are embedded in the non-coding portion of the human genome, but identifying and defining these features remains a major challenge. An initial estimate of the magnitude of functional non-coding DNA was derived from comparative analysis of the first available mammalian genomes (human and mouse), which indicated that fewer than half of the evolutionary constrained sequences in the human genome encode proteins1, a prospect that gained further support when additional vertebrate genomes became available for comparative genomic analyses2.
The overall impact of these presumably functional non-coding sequences on human biology was initially unclear. A considerable urgency to define their locations and functions came from a growing number of known associations of non-coding sequence variants with common human diseases. Specifically, genome-wide association studies (GWAS) have revealed a large number of disease susceptibility regions that do not overlap protein-coding genes but rather map to non-coding intervals. For example, a 58-kilobase linkage disequilibrium block located at human chromosome 9p21 was shown to be reproducibly associated with an increased risk for coronary artery disease, yet the risk interval lies more than 60 kilobases away from the nearest known protein-coding gene3, 4. To estimate the global contribution of variation in non-coding sequences to phenotypic and disease traits, we performed a meta-analysis of 1,200 single-nucleotide polymorphisms (SNPs) identified as the most significantly associated variants in GWAS published so far (ref. 5, accessed 2 March 2009). Using conservative parameters that tend to overestimate the size of linkage disequilibrium blocks, we found that in 40% of cases (472 of 1,170) no known exons overlap either the linked SNP or its associated haplotype block, suggesting that in more than one-third of cases non-coding sequence variation causally contributes to the traits under investigation.
One possibility that could explain these GWAS hits is that the non-coding intervals contain enhancers, a category of gene regulatory sequence that can act over long distances. A simplified view of the current understanding of the role of enhancers in regulating genes is summarized in Fig. 1. The docking of RNA polymerase II to proximal promoter sequences and transcription initiation are fairly well characterized; by contrast, the mechanisms by which insulator and silencer elements buffer or repress gene regulation, respectively, are less well understood6. Transcriptional enhancers are regulatory sequences that can be located upstream of, downstream of or within their target gene and can modulate expression independently of their orientation7. In vertebrates, enhancer sequences are thought to comprise densely clustered aggregations of transcription-factor-binding sites8. When appropriate occupancy of transcription-factor-binding sites is achieved, recruitment of transcriptional coactivators and chromatin-remodelling proteins occurs. The resultant protein aggregates are thought to facilitate DNA looping and ultimately promoter-mediated gene activation (see page 212). In-depth studies of individual genes such as APOE or NKX2-5 (reviewed in ref. 9) have shown that many genes are regulated by complex arrays of enhancers, each driving distinct aspects of the messenger RNA expression pattern. These modular properties of mammalian enhancers are also supported by their additive regulatory activities in heterologous recombination experiments10.
a, For many genes, the regulatory information embedded in the promoter is insufficient to drive the complex expression pattern observed at the messenger RNA level. For example, a gene could be expressed both in the brain and in the limbs during embryonic development (red), even if the promoter by itself is not active in either of these structures, suggesting that appropriate expression depends on additional sequences that are distant-acting and cis-regulatory. However, defining the genomic locations of such regulatory elements (question marks) and their activities in time and space (arrows) is a major challenge. b, c, Tissue-specific enhancers are thought to contain combinations of binding sites for different transcription factors. Only when all required transcription factors are present in a tissue does the enhancer become active: it binds to transcriptional coactivators, relocates into physical proximity with the gene promoter (through a looping mechanism) and activates transcription by RNA polymerase II. In any given tissue, only a subset of enhancers is active, as schematically shown in b and c for the example gene pictured in a, whose expression is controlled by two separate enhancers with brain-specific and limb-specific activities. Insulator elements prevent enhancer–promoter interactions and can thus restrict the activity of enhancers to defined chromatin domains. In addition to activation by enhancers, negative regulatory elements (including repressors and silencers) can contribute to transcriptional regulation (not shown).
The purely genetic evidence from GWAS does not allow any direct inferences regarding the underlying molecular mechanisms, but a number of in-depth studies of individual loci (see below) suggest that variation in distant-acting enhancer sequences and the resultant changes in their activities can contribute to human disorders. Although we anticipate a variety of other non-coding functional categories such as negative gene regulators or non-coding RNAs to have a role in human disease, in this Review we focus on the role of enhancers and on strategies to define their location and function throughout the genome.
Enhancers in human disease
Beginning with the discovery that an inherited change in the -globin gene alters one of the coded amino acids and thereby causes sickle-cell anaemia11,12, thousands of mutations in the coding regions of genes have been identified to be responsible for monogenic disorders over the past half century. By contrast, the role of mutations not involving primary gene structural sequences has been minimally explored, largely owing to our inability to recognize relevant non-coding sequences, much less predict their function. The molecular genetic identification of individual enhancers involved in disease has been, in most cases, a painstaking and inefficient endeavour. Nevertheless, a number of successful studies have shown that distant-acting gene enhancers exist in the human genome and that variation in their sequences can contribute to disease. In this section, we discuss three examples in which enhancers were directly shown to play a role in human disease: thalassaemias resulting from deletions or rearrangements of -globin gene (HBB) enhancers, preaxial polydactyly resulting from sonic hedgehog (SHH) limb-enhancer point mutations, and susceptibility to Hirschsprung’s disease associated with a RETproto-oncogene enhancer variant.
The extensive studies of the human globin system and its role in haemoglobinopathies have historically served as a test bed for defining not only the role of coding sequences in disease11, 12 but also that of non-coding sequences. The -thalassaemias and -thalassaemias are haemoglobinopathies resulting from imbalances in the ratio of -globin to -globin chains in red blood cells. The molecular basis of these conditions was initially elucidated in cases in which inactivation or deletion of globin structural genes could be readily identified13. However, although gene deletion or sequence changes resulting in a truncated or non-functional gene product explained some thalassaemia cases, for a subset of patients intensive sequencing efforts failed to reveal abnormalities in globin protein-coding sequences. Through extensive long-range mapping and sequencing of DNA from individuals diagnosed with thalassaemia but lacking globin coding mutations, it was eventually discovered that many of these globin chain imbalances were due to deletion or chromosome rearrangements that resulted in the repositioning of distant-acting enhancers required for normal globin gene expression14, 15. These early molecular genetic studies revealed a clear role for non-coding regulatory elements as a cause of human disorders through their impact on gene expression. Since then, many such examples of ‘position effects’, defined as changes in the expression of a gene when its location in a chromosome is changed, often by translocation, have been found16.
In addition to the pathological consequences of the removal or the repositioning of distant-acting enhancers, there are also examples of single-nucleotide changes within enhancer elements as a cause of human disorders. One example of this category of disease-causing non-coding mutation involves the limb-specific long-distance enhancer ZRS (also known as MFCS1) of SHH(Fig. 2). This enhancer is located at the extreme distance of approximately 1 megabase from SHH, within the intron of a neighbouring gene17, 18. Of interest is that, initially, the gene in which the enhancer resides was thought to be relevant for limb development and was therefore named limb region 1 (LMBR1)19. Facilitated by the functional knowledge of the ZRS enhancer from mouse studies, targeted resequencing screens of this enhancer in humans revealed that it is associated with preaxial polydactyly. Approximately a dozen different single-nucleotide variations in this regulatory element have been identified in humans with preaxial polydactyly and segregate with the limb abnormality in families18, 20. Studies of the impact of the human ZRS sequence changes have been carried out in transgenic mice, in which the single-nucleotide changes result in ectopic anterior-limb expression during development, consistent with preaxial digit outgrowth21. Furthermore, sequence changes in the orthologous enhancers were found in mice, as well as in cats, with preaxial polydactyly22, 23, and targeted deletion of the enhancer in mice caused truncation of limbs17. These studies illustrate the importance of first experimentally identifying distant-acting enhancers in allowing subsequent human genetic studies to explore the potential role of disease-causing mutation in functional non-coding sequences.