Amazing stuff!
"... The latest [human genome] datasets, published in 2 back-to-back studies in the journal Nature, present what may be the most complete overview of the human genome to date. ...
“For too long, our genetic references have excluded much of the world’s population. This work captures essential variation that helps explain why disease risk isn’t the same for everyone.”
The first paper analysed the genomes of 1,019 people from 5 continents and 28 population groups.
It focused on genomic structural variants – large sections of DNA that have been deleted, duplicated, inserted, inverted or shuffled and which introduce changes to thousands of DNA bases at a time. They contribute to genetic diversity but are also increasingly associated with diseases and cancers. ...
The team found and categorised more than 167,000 structural variants, doubling the known amount in the human pangenome. ...
“For example, 50.9% of insertions and 14.5% of deletions we found have not been reported in previous variation catalogues. It’s an important step to map blind spots in the human genome and reduce the bias that has long favoured genomes of European descent.” ...
The second study took a slightly different approach by sequencing fewer genomes at much greater detail. The researchers used several sequencing technologies to combine highly accurate medium-length DNA reads with longer, lower-accuracy ones.
This strategy allowed them to piece together the near-complete genomes of 65 individuals. They also decoded some of the most difficult to read stretches, including the highly repetitive centromeres, Y chromosomes and an intricate region associated with the immune system’s Major Histocompatibility Complex. ..."
"... This milestone builds on two foundational studies that reshaped the field of genomics.
In 2022, researchers achieved the first-ever complete sequence of a single human genome, filling in major gaps left by the original Human Genome Project.
In 2023, scientists released a draft pangenome constructed from 47 individuals—a critical step toward representing global genetic diversity.
The new study significantly expands on both efforts, closing 92% of the remaining data gaps and mapping genomic variation across ancestries with a breadth and resolution never achieved. ..."
From the abstract (1):
"Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation.
Here we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (median continuity of 130 Mb), closing 92% of all previous assembly gaps and reaching telomere-to-telomere status for 39% of the chromosomes.
We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8 and AMY1/AMY2, and fully resolve 1,852 complex structural variants.
In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite higher-order repeat array length and characterize the pattern of mobile element insertions into α-satellite higher-order repeat arrays. Although most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres.
Combining our data with the draft pangenome reference significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference to a median quality value of 45. Using this approach, 26,115 structural variants per individual are detected, substantially increasing the number of structural variants now amenable to downstream disease association studies."
From the abstract (2):
"Genomic structural variants (SVs) contribute substantially to genetic diversity and human diseases, yet remain under-characterized in population-scale cohorts. Here we conducted long-read sequencing in 1,019 humans to construct an intermediate-coverage resource covering 26 populations from the 1000 Genomes Project.
Integrating linear and graph genome-based analyses, we uncover over 100,000 sequence-resolved biallelic SVs and we genotype 300,000 multiallelic variable number of tandem repeats, advancing SV characterization over short-read-based population-scale surveys.
We characterize deletions, duplications, insertions and inversions in distinct populations. Long interspersed nuclear element-1 (L1) and SINE-VNTR-Alu (SVA) retrotransposition activities mediate the transduction of unique sequence stretches in 5′ or 3′, depending on source mobile element class and locus. SV breakpoint analyses point to a spectrum of homology-mediated processes contributing to SV formation and recurrent deletion events.
Our open-access resource underscores the value of long-read sequencing in advancing SV characterization and enables guiding variant prioritization in patient genomes."
The most complete view of the human genome yet sets new standard for use in precision medicine (original news release) "New research decodes the most elusive, difficult-to-sequence regions of the genome from populations around the world, rewriting knowledge of human biology and setting a new benchmark for precision medicine."
Complex genetic variation in nearly complete human genomes (1, open access)
Fig. 3: Prevalence of distinct SV [structural variation] classes in our SAGA-based data resource.
No comments:
Post a Comment