Project Deep Codon
building a new biology from the unread genome
Presenting first-in-the-class biomolecules
The Dark Genome Revolution
For nearly fifty years, scientific focus on DNA was narrow, concentrating solely on the 2% that codes for proteins. The vast 98% of human genome was largely relegated to the shadows, dismissed as "junk DNA"—a convenient oversight to avoid difficult questions.
Yet, this genomic chunk proved anything but 'junk'. Researchers later unveiled a rich landscape of RNA molecules—regulators, and guides—quietly orchestrating cellular functions. This discovery profoundly reshaped our understanding of genomic complexity.
But even after this paradigm shift, a significant portion of the genome remained untouched: the silent, non-expressing regions of genome: no proteins, no RNA - just deep silence. This became one of the biggest scientific blind spots of our time.
Then everything changed.
In 2009, a bold experiment ventured into this unexplored territory. The astonishing finding of new, functional genes and proteins could be created from these naturally silent stretches of DNA. This revealed the immense, unseen potential hidden within the "dark matter of genome." (Dhar et al., 2009).
2009
First exptl proof - making functional proteins from E.coli intergenic sequences
2013
Anti-malarial peptides predicted from E. coli intergenic DNA.
2015
Anti Alzheimer peptides predicted from
S.cerevisae intergenic DNA
2015
Predicted functional proteins from Pseudogenes
of S.cerevisiae
2017
Predicted anti microbial peptides from intergenic sequences of
D.melanogaster
2020
D. melanogaster intergenic seq derived peptides showed antimicrobial activity
2022
E. coli intergenic sequence derived peptides showed anti-cancer activity.
2023
E. coli tRNA encoded peptides displayed anti-Leishmania activity.
2024
Computationally predicted vaccine candidates from E.coli tRNA
sequences
2024
Predicted antisense & reverse ORF proteins
We have been continuing the same line of research for more than a decade and a half. What started as a random idea has evolved into a life long mission. The Deep Codon Project is driving Functional Genomics 2.0 — uncovering the hidden purpose of genomic regions that evolution chose not to delete and recoding non-translating RNA sequences to make first in the class peptides and proteins. This marks a shift from individual genes to entire genome as a canvas for innovation.
The implications are profound. This isn’t just another chapter in biology; it’s a seismic shift—from studying nature's decisions to synthetically unlocking the coding potential of the entire genome.
The Deep Codon Project isn't merely updating molecular biology texts; it's laying the foundation for a new roadmap for the next generation of experimental and computational biologists.
Understanding The Dark Matter of Genome
The 'dark matter of genome' comprises two fundamental categories, each with distinct molecular characteristics and can be expressed into functional peptides, proteins and pathways
Type 1: Non-Expressing DNA
DNA sequences that are not transcribed
  • Intergenic regions between any two genes
  • Antisense strands complementary to the coding sequences
  • Reverse open reading frames (ORFs)
  • Repetitive DNA elements e.g, those found in telomeric regions
  • Pseudogenes — non expressing evolutionary relics
Type 2: Non-Translating RNA
RNA molecules that are not translated
  • Introns spliced out during mRNA processing
  • Ribosomal RNA (rRNA) for protein synthesis machinery
  • Transfer RNA (tRNA) for amino acid delivery
  • MicroRNA for gene regulation
  • Long non-coding RNA (lncRNA) for epigenetic control
Type 1 Dark Matter: Non-Expressing DNA Sequences
Collectively, the non transcribed regions of the genome represent a massive, uncharted library of functional sequence space. Through computational decoding and synthetic expression, these dark regions of the genome are now emerging as a frontier for next-generation first-in-the-class molecules with industrial and therapeutic value.
The genome’s “dark matter” comprises a constellation of sequence classes that lie outside conventional protein and RNA coding annotations and are evolutionary hand-me-downs.
Intergenic sequences constitute the majority of untapped genomic real estate. Although not naturally expressed, computational mining reveals a dense distribution of potential coding elements that, when expressed, can launch new cellular pathways.
Every annotated gene has a complementary i.e., antisense and reverse counterpart. Synthetic translation experiments from our group demonstrate that these unconventional reading frames can produce novel proteins and pathways.
Repetitive elements—including telomeric repeats, microsatellites, and LINE/SINE-derived sequences—occupy significant portions of the genome. It would be interesting to use repeat-sequence motifs to see whether functional proteins and pathways are possible, or if they represent the negative data set of natural biology.
With several thousand pseudogenes scattered across the genome, the sheer frequency of these “nonfunctional relics” makes them a significant repository of dormant ORFs. Synthetic reconstruction shows that many pseudogene-derived peptides fold into functional proteins, suggesting that evolutionary silence does not imply biochemical irrelevance.
The “dark matter” of the genome is not a single deficit but a layered frontier — intergenic RNAs, antisense transcripts, alternative ORFs, repeats, and pseudogenes—that together expand the functional and evolutionary repertoire of genomes. Modern technologies now permit systematic discovery, validation, and functional applications of these regions.
Type 2 Dark Matter: Non-Translating RNA Sequences
For decades, the classical definition of the functional genome painted the non-coding regions as RNA coding elements whose journey towards translation was aborted. But this narrative is now collapsing. The non-coding genome is not a polypeptide graveyard; it is a massive, untouched reservoir of protein and pathway innovations - we call it the type II Dark Matter of genome.
This new polypeptide landscape includes introns, tRNAs, rRNAs, lncRNAs, and miRNAs—elements historically labeled as “non-coding,” yet brimming with latent transation potential. Our computational and experimental work (unpublished) challenges the classic assumption. Non-coding RNAs are some of the richest substrates for synthetic peptide design. Their untapped informational value can be realized when expressed synthetically.
Introns, long dismissed as splicing waste, have emerged as one of the most promising peptide sources. Our studies show that these discarded RNA sequences can be systematically translated into stable, bioactive peptides and proteins. A hidden layer of functional peptide chemistry is waiting for exploration.
Ribosomal RNA and transfer RNA—central components of the protein-making machinery—have never been evolutionarily considered candidates for translation. Yet these sequences offer a unique and rich template for generating novel functional peptides with amazing applications. The scaffold that Nature uses for making proteins can itself be used to design new synthetic peptides, proteins and pathways.
Although miRNAs are only ~22 nucleotides long, these smallest elements of the transcriptome may become the most precise tools in peptide engineering. With lengths spanning hundreds to thousands of bases, lncRNAs represent an enormous, uncharted protein coding reservoir. Their sheer sequence diversity provides a fertile platform for creating designer peptides and novel biochemical pathways.
In summary, the Type II Dark Matter provides the foundation of next generation of metabolic engineering. We can now mine the transcriptome for new peptide families, catalytic activities, and therapeutic outcomes.
The Deep Codon Technology
Therapeutics Portfolio
Despite decades of biomedical breakthroughs, humanity is colliding with a "healthcare cliff." We face an urgent shortfall in therapies for our deadliest and most debilitating challenges—gaps that are costing millions of lives and trillions of dollars. In 2022 alone, the world saw nearly 20 million new diagnoses and 9.7 million deaths.
While innovation exists, it comes at a staggering price: oncology drug spending is estimated to have exceeded $250 billion in 2024. The trend line is clear—current care models are becoming financially unsustainable while demand skyrockets. We are already living in the post-antibiotic era.
Bacterial AMR is a global catastrophe, associated with more than 5 million deaths in 2019, with 1.27 million deaths directly attributable to resistant infections. This burden disproportionately hits low- and middle-income nations, but without transformational interventions, no border will remain safe from untreatable infections.
Ancient diseases are refusing to fade. Malaria alone struck an estimated 263 million people in 2023, claiming roughly 597,000 lives. With the antiparasitic market valued at ~ $20 billion, there is both a desperate humanitarian need and a clear scientific reason for new chemistries to overcome resistance and biological complexity. As global populations age, neurodegenerative disorders are surging. By 2021, ~57 million people were living with dementia, with nearly 10 million new cases emerging annually.
Critically, disease-modifying therapies remain largely absent, leaving families and economies to shoulder skyrocketing care costs.
Why do these gaps persist? Because traditional drug discovery is hitting a wall.
Confronted by discovery bottlenecks, rising resistance, and ballooning R&D costs, the old pipelines are plateauing. To deliver safer, affordable, and effective therapeutics, we don’t just need new drugs; we need entirely new molecular starting points and fresh biological frameworks.

This is where the DEEP CODON efforts step in. The key is to find a new and effective solution within our genomes, not outside bodies. Using the dark matter of genome we have built a new drug discovery pipeline providing the proof of the concept against cancer, malaria, leishmania, alzheimers and pathogenic microbes.
Enzyme Portfolio
The dark matter of the genome constitutes a previously inaccessible coding reservoir with significant relevance for enzyme discovery and engineering. By demonstrating that non-expressing sequences can encode functional proteins, current research establishes the foundation for a new paradigm in enzyme design—one that leverages the full genomic landscape, not just the annotated minority This approach expands accessible catalytic diversity, supports de novo pathway construction, and opens new avenues for industrial, therapeutic, and environmental biotechnology.
Restriction Enzymes
Traditional restriction enzymes evolved under natural selection, limiting the diversity of recognition sites. Using dark-genome–derived peptide scaffolds, we can engineer de novo restriction-like nucleases with customizable sequence specificity, altered cutting patterns, and improved temperature or pH tolerance. This opens the possibility of next-generation genome editing tools that operate outside the constraints of natural proteins.
Industrial Enzymes
Industrial biotechnology relies on enzymes that catalyze reactions under extreme or specialized conditions—such as high heat, alkaline environments, solvents, and so on. Dark-genome sequences offer an unprecedented reservoir of novel catalytic frameworks, enabling the design of enzymes for biofuel production, food processing, detergent formulations, plastic degradation and green chemistry. Because these enzymes originate from non-expressed sequences, they are free from evolutionary constraints, allowing the creation of functions not typically seen in nature.
The dawn of a new molecular biology
tRNA encoded peptides
Our breakthrough discovery has revealed tRNA-encoded peptides (tREPs) as a completely untapped class of bioactive molecules with strong therapeutic potential. The tRNA encoded peptide tREP-18 showed remarkably strong antileishmanial activity, at nanomolar concentrations (IC₅₀ ≈ 22 nM) while remaining safe for human cells. This work provides the first evidence that tRNAs, traditionally viewed as non-coding molecules, can be repurposed into functional, therapeutically relevant peptides. The discovery opens a new frontier of “tRNA-encoded peptides” (tREPs), revealing a vast, unexplored source of molecules that could lead to next-generation therapeutic molecules and vaccines.
This proof-of-concept of repurposing tRNA sequences opens a vast and unexplored landscape of innovations.
Designing First-in-the-Class Pathways
The dark matter of the genome - traditionally viewed as evolutionarily silent or non-functional - is emerging as one of the most promising frontiers for next-generation biological innovations. Both Type I (non expressing sequences) and Type II (non translating RNA molecules) can be synthetically expressed, and optimized to generate a novel inventory of functional biomolecules.
By stitching a large inventory of lab-born RNA molecules, peptides and proteins using domain prediction and molecular docking tools, it is now feasible to design and construct novel cellular pathways. These could be regulatory, signaling or metabolic pathways designed towards understanding evolutionary biology from a new lens and generate useful applications. The dark matter of genome does not merely fill evolutionary gaps—it opens an entirely new design dimension for biological engineering. Our initial work has unveiled an entirely new biological landscape—exciting, rich with possibilities, and opening the door to unprecedented cellular rewiring, microbial factories and synthetic cells built entirely from the interactome of dark-genome–derived elements.
Relevant Publications
1
Verma, N., Manvati, S., & Dhar, P. K. (2023). Harnessing Escherichia coli’s Dark Genome to Produce Anti-Alzheimer Peptides. bioRxiv. https://doi.org/10.1101/2023.06.23.546343
2
Garg, M., & Dhar, P. K. (2023a). Repurposing the Dark Genome I: Antisense Proteins. bioRxiv. https://doi.org/10.1101/2023.03.15.532699
3
Nayak, S., & Dhar, P. K. (2023a). Repurposing the Dark Genome II – Reverse Proteins. bioRxiv. https://doi.org/10.1101/2023.03.20.533367
4
Garg, M., & Dhar, P. K. (2023b). Repurposing The Dark Genome. III - Intronic Proteins. bioRxiv. https://doi.org/10.1101/2023.06.10.544447
5
Nayak, S., & Dhar, P. K. (2023b). Repurposing the Dark Genome IV – Noncoding Proteins. bioRxiv. https://doi.org/10.1101/2023.06.29.547021
6
Chakrabarti, A., Kaushik, M., Khan, J., et al. (2022). tREPs – a new class of functional tRNA encoded peptides. ACS Omega, 7, 18361–73. https://doi.org/10.1021/acsomega.2c01234
7
Varughese, D., Nair, A. S., & Dhar, P. K. (2017). Function annotation of novel peptides generated from the non-expressing genome of D. melanogaster. Bioinformation, 13(1), 17–20
8
Krishnan, R., Kumar, V., Ananth, V., et al. (2015). Computational identification of novel microRNAs and their targets in the malarial vector Anopheles stephensi. Systems and Synthetic Biology Journal, 9, 11–17.
9
Raj, N., Helen, A., Manoj, N., et al. (2015). In silico study of peptide inhibitors against BACE. Systems and Synthetic Biology Journal, 9, 67–72.
10
Shidhi, P. R., Suravajhala, P., Nayeema, A., et al. (2015). Making novel proteins from pseudogenes. Bioinformatics, 31(1), 33–39. https://doi.org/10.1093/bioinformatics/btu585
11
Joshi, M., Kundapura, S. V., Poovaiah, T., Ingle, K., & Dhar, P. K. (2013). Discovering novel anti-malarial peptides from the not-coding genome—A working hypothesis. Current Synthetic and Systems Biology, 1(1).
12
Dhar, P. K., Nanduri, B., et al. (2009). Synthesizing non-natural parts from natural genomic template. Journal of Biological Engineering, 3, 2. https://doi.org/10.1186/1754-1611-3-2