Services

An international team of researchers has achieved a scientific milestone by unraveling for the first time the genetic code of an entire human chromosome.

Reported in this week's issue of Nature (Dec. 2), researchers at the Sanger Centre near Cambridge, England; University of Oklahoma, Norman, OK; Washington University, St. Louis, MO; and Keio University in Japan have succeeded in deciphering the sequence of the 33.5 million "letters," or chemical components, that make up the DNA of chromosome 22.

This sequence includes the longest, continuous stretch of DNA ever deciphered and assembled. It is over 23 million letters in length.

Each human gene is made up of a series of chemical building blocks represented by letters, A (adenine), T (thymine), G (guanine) and C (cytosine). The number and order of these letters, also called bases, determine what we are, how we look, and the diseases to which we may be predisposed. The chromosome 22 team has deduced the text of one chapter of the human genetic instruction book.

The next mammoth task is to determine what it all means. Sequencing and mapping efforts have already revealed that chromosome 22 is implicated in the workings of the immune system, congenital heart disease, schizophrenia, mental retardation, birth defects, and several cancers including leukemia. But, the scientific team agrees that many more secrets are to be discovered in this decoded text.

The sequencing of chromosome 22 permits scientists for the first time to view the entire DNA of a chromosome.

"This is the first time that we have been able to see the organization of a chromosome at the base pair level," said Dr. Ian Dunham, senior research fellow at the Sanger Centre and leader of the research team that deciphered chromosome 22. "This immediately suggests new experiments and avenues of research which can be pursued."

"To see the entire sequence of a human chromosome for the first time is like seeing an ocean liner emerge out of the fog, when all you've ever seen before were rowboats," said Dr. Francis Collins, director of the National Human Genome Research Institute of the National Institutes of Health which supported the U.S. contribution to the sequencing of chromosome 22.

University of Oklahoma scientist Dr. Bruce Roe, one of the researchers who deciphered the sequence of chromosome 22, added, "It's incredible. For the first time we can stand back and view a picture of all the structures and other features of a human chromosome, to see how a chromosome is organized. Now we can begin to understand where genes are located on chromosomes, how they express themselves, how deletions that give rise to disease-causing mutations occur, and how chromosomes are duplicated and inherited."

Chromosome 22 is the first of 23 human chromosome pairs to be deciphered because of its relatively small size and its association with several diseases and because of the groundwork of several scientists beginning in the early 1990s.

Because protein-coding genes do not seem to occur on the short arm of chromosome 22, the scientists focused on the chromosome's long arm, which is richer in genes relative to other human chromosomes. Ninety seven percent of this arm was sequenced.

The sequence contains 11 gaps or areas that could not be deciphered with current technology. The location and size of the gaps were determined. The 33.5 million bases of sequenced DNA are extremely high quality with an error rate of less than one in 50,000 bases.

A total number of at least 545 genes and 134 pseudogenes (genes that once functioned but no longer do) were detected on the chromosome, with 200 to 300 additional ones likely. If representative of other chromosomes, this count suggests that the total number of genes on all human chromosomes will not be substantially more or less than the previously estimated number of 80,000.
The genes range in size from 1,000 to 583,000 bases of DNA with a mean size of 190,000 bases. A total of 39 percent of the chromosome is copied into RNA (exons and introns), while only 3 percent of the chromosome encodes protein.
A total of 247 genes were revealed by computer analyses to be identical to previously identified human genes or protein sequences. Computer analysis of the chromosome 22 sequence found 150 additional genes with DNA sequence similarity to known genes. An additional 148 predicted genes containing sequence homologous to known genetic markers (ESTs) were identified.
Several gene families appear to have arisen by tandem duplication. There are families of genes that are interspersed among other genes and distributed over large chromosomal regions.
There is unexpected long-range complexity of the chromosome with an elaborate array of repeat sequences near the centromere of the chromosome. The existence of so much repetitive DNA information could help explain how this chromosome rearranges or reshuffles its DNA, leading to human disorders such as DiGeorge syndrome, which includes a form of mental retardation, and how chromosome structure changes over time.
An unexpected finding shows several regions where recombination is increased, and others where it is suppressed, and these will probably play a role in health and disease.

Comparing the chromosome 22 sequence to known gene sequences of the mouse, a lab animal frequently used to facilitate understanding of human genetic disorders, the research team found 160 human genes that have comparable sequences in the mouse. Examining the chromosomal locations of the mouse genes that have counterparts on the human chromosome 22 shows that the order of the genes along the chromosome in the two species is genetically conserved, although the mouse homologs of human genes on chromosome 22 are dispersed to eight different mouse chromosomal regions.

The sequencing of the DNA of chromosome 22 was conducted as part of the international Human Genome Project, which involves scientists in the U.S., England, Japan, France, Germany and China.

In deciphering chromosome 22, scientists used the approach that has been developed and widely tested by the Human Genome Project. This approach involves sequencing overlapping cloned segments of DNA from known locations on the chromosome.

Until now, scientists were uncertain about whether an entire human chromosome could be sequenced in this manner. For example, they did not know whether insurmountable problems would prevent assembling their sequencing data. The presence of a small number of unclonable gaps was not unexpected, but the scientists carrying out this project adhered to the agreed upon standard that a chromosome should not be considered "essentially complete," until the sequence of regions that are clonable and sequenceable with current technology have been determined to high accuracy, and the sizes of any remaining gaps have been determined.

"That chromosome 22 was essentially sequenced by using overlapping clones increases our confidence that the Human Genome Project will be able to complete a 'working draft' of the DNA sequence of the human genome in Spring 2000 and finish it by 2003," said Dr. Richard Wilson, co-director of the Genome Sequencing Center at Washington University School of Medicine in St. Louis and member of the research team that deciphered chromosome 22.

The results of the Human Genome Project, which are freely accessible through public databases such as GenBank (www.ncbi.nlm.nih.gov/genome/seq), give scientists insight into the way genes are arranged along a strip of DNA and paves the way for major advances in the diagnosis and treatment of disease.

Knowing the identity and order of the chemical components of the DNA of the 23 pairs of chromosomes that are found in almost every human cell provides a tool to determine the basis of health and disease. "The fact that all of this information is now freely available for scientists to use, without the constraints of patents and fees, is of major importance, if the knowledge of our genetic make-up is to be used for the good of mankind," said Dr. Michael Morgan, chief executive of the Wellcome Trust Genome Campus, which is home to the Sanger Centre.