Biotechnology

From 1.5 Billion Sequences to High-Performing Enzymes in 2 Months

The Challenge: Unlocking Value from Big Data

Our client, a leader in biotechnology, had amassed a proprietary metagenomics database of 1.5 billion protein sequences. This vast resource held immense potential for identifying novel sequences with industrial value, but its sheer scale posed a critical challenge: traditional methods for enzyme discovery were too inefficient to be viable.

With sequence identities as low as 20-30% to known homologs, standard homology-based screening was impractical. The client was unable to effectively identify novel enzymes and unlock the value hidden within their data asset.

 

Our Solution

Zymvol implemented a targeted, multi-stage computational approach focused on laccases. Laccases were selected as the initial target due to their significant industrial value as powerful oxidases that use molecular oxygen and release only water, as well as our team’s deep expertise in their engineering and application.

Our methodology transformed the discovery process from a numbers game into a precision-driven search:

  1. Massive-Scale Filtering: Our proprietary filters first scanned the entire 1.5 billion sequence database, reducing the search space to ~200,000 potential candidates based on conserved structural motifs and domains essential for laccase function.
  2. In Silico Functional Analysis: Each candidate sequence was then subjected to a rigorous computational pipeline. This process involved homology modeling to predict 3D structures, followed by molecular docking simulations to assess potential substrate interactions. Machine learning models, trained on extensive enzyme performance data, were used to predict key performance indicators (KPIs) such as catalytic efficiency (kcat​/KM​), thermostability, and optimal pH.
  3. Candidate Prioritization: This multi-parameter analysis allowed us to rank all candidates based on their predicted performance profiles, leading to the downselection of a final, high-potential list of 25 sequences for laboratory synthesis and validation.

Results & Impact

  • Exceptional Hit Rate
    From the 25 prioritized candidates, 7 were confirmed as active novel laccases—an exceptional wet-lab hit rate of 28%.
  • Superior Performance
    3 of the novel enzymes demonstrated an average 30% improvement in yield activity compared to a leading commercial laccase.
  • Unprecedented Speed and Efficiency
    The entire discovery-to-validation cycle was completed in just two months, a fraction of the time required for traditional approaches. For perspective, a conventional bioinformatics screen would have yielded over 5,000 hits. Our approach reduced the required lab screening by over 99.5%, dramatically de-risking the R&D process, lowering costs, and accelerating the timeline for bringing new products to market.

Highlights

  • 7 Novel laccases discovered & validated
    A 28% hit rate from the 25 candidates tested in the lab.
  • 3 laccases better than gold-standard commercial laccase
    Average 30% improvement in yield activity than best known commercial laccase
  • 99.5% Reduction in lab screening
    Our computational funnel narrowed a potential field of over 5,000 candidates down to just 25, dramatically saving time and resources.
  • 2-Month total project timeline
    The entire discovery-to-validation cycle was completed at unprecedented speed.
Page background