Amazing stuf! CRISPR keeps on giving!
"... The algorithm ... uses big-data clustering approaches to rapidly search massive amounts of genomic data. The team used their algorithm, called Fast Locality-Sensitive Hashing-based clustering (FLSHclust) to mine three major public databases that contain data from a wide range of unusual bacteria, including ones found in coal mines, breweries, Antarctic lakes, and dog saliva. The scientists found a surprising number and diversity of CRISPR systems, including ones that could make edits to DNA in human cells, others that can target RNA, and many with a variety of other functions.
The new systems could potentially be harnessed to edit mammalian cells with fewer off-target effects than current Cas9 systems. They could also one day be used as diagnostics or serve as molecular records of activity inside cells. ..."
From the editor's summary and abstract:
"Editor’s summary
Microbial biochemicals systems are incredibly diverse, and computational tools to analyze sequence data are essential in identifying new and valuable components for biotechnology development. Using an approach called deep terascale clustering, ... found more than 200 new functional systems linked to CRISPR, a technology editing DNA. Some of the discovered genes are linked to precise DNA-editing systems that may enable safer therapeutic genome editing. The authors also identified a CRISPR-Cas enzyme, Cas14, which cuts RNA precisely. These discoveries may help to further improve DNA- and RNA-editing technologies, with wide-ranging applications in medicine and biotechnology. ...
Structured Abstract
... We sought to comprehensively enumerate CRISPR-linked gene modules in all existing publicly available sequencing data. Recently, several previously unknown biochemical activities have been linked to programmable nucleic acid recognition by CRISPR systems, including transposition and protease activity. We reasoned that many more diverse enzymatic activities may be associated with CRISPR systems, many of which could be of low abundance in existing sequence databases.
RESULTS
We developed fast locality-sensitive hashing–based clustering (FLSHclust), a parallelized, deep clustering algorithm with linearithmic scaling based on locality-sensitive hashing. FLSHclust approaches MMseqs2, a gold-standard quadratic-scaling algorithm, in clustering performance. We applied FLSHclust in a sensitive CRISPR discovery pipeline and identified 188 previously unreported CRISPR-associated systems, including many rare systems.
We experimentally characterized four of the newly discovered systems. We examined a type IV system with an HNH nuclease domain inserted in the CRISPR-associated DNA damage-inducible gene G (DinG)–like helicase. We found that this system exhibited RNA-guided protospacer-adjacent motif (PAM)–dependent directional double-stranded DNA (dsDNA) degradation, which required both the adenosine triphosphate (ATP) hydrolysis and HNH nuclease functions of the DinG-HNH protein. This is the first demonstration of a type IV system with a specified interference mechanism. We characterized two type I systems containing HNH nuclease domains inserted in different subunits of Cascade (Cas8-HNH and Cas5-HNH). We found that both of these systems performed precise dsDNA cleavage and single-stranded DNA (ssDNA) cleavage. We additionally observed collateral cleavage of ssDNA by the Cas5-HNH system. We demonstrated that both systems can be applied for genome editing in human cells and that the Cas8-HNH system is highly specific. We also studied candidate type VII systems, including a minimal Cas7-Cas5 effector complex and a distinctive interference protein including a β-CASP domain. We showed that these systems are likely derived from type III-E CRISPR systems and are RNA targeting.
Other CRISPR-linked systems that we found include additional potential effector and adaptation components, two previously unknown associations of Mu transposons with CRISPR systems, and numerous newly identified proteins and domains associated with type V systems. We also identified an instance of potential co-option of a Cas9 as an anti-CRISPR mechanism and noted several non-CRISPR hypervariable regularly interspersed repeat arrays.
CONCLUSION
This study introduces FLSHclust as a tool to cluster millions of sequences quickly and efficiently, with broad applications in mining large sequence databases. The CRISPR-linked systems that we discovered represent an untapped trove of diverse biochemical activities linked to RNA-guided mechanisms, with great potential for development as biotechnologies."
Uncovering the functional diversity of rare CRISPR-Cas systems with deep terascale clustering (no public access, but link to PDF is provided in the above article)
No comments:
Post a Comment