{"id":286865,"date":"2025-07-08T07:37:29","date_gmt":"2025-07-08T07:37:29","guid":{"rendered":"https:\/\/clarivate.com\/intellectual-property\/?p=286865"},"modified":"2025-10-02T13:55:51","modified_gmt":"2025-10-02T13:55:51","slug":"megapatents-why-expert-curated-biological-sequence-data-is-critical-for-effective-ip-strategy","status":"publish","type":"post","link":"https:\/\/clarivate.com\/intellectual-property\/blog\/megapatents-why-expert-curated-biological-sequence-data-is-critical-for-effective-ip-strategy\/","title":{"rendered":"Megapatents: Why expert curated biological sequence data is critical for effective IP strategy"},"content":{"rendered":"<p>Today, a single patent can contain tens of thousands of biological sequences, many of which are hidden in plain sight. These so-called \u2018megapatents\u2019 raise the stakes for how biological sequence data is captured, interpreted and used in decision-making.<\/p>\n<p>Relying solely on electronic sequence listings can lead to costly blind spots for patent professionals and researchers. That\u2019s why curated, context-rich data, like that provided by <a href=\"https:\/\/clarivate.com\/intellectual-property\/derwent\/geneseq\/\">GENESEQ,<\/a> is no longer a luxury but a necessity.<\/p>\n<p>This blog explores how megapatents expose the limitations of automated sequence capture and why GENESEQ\u2019s human-curated approach is essential for accurate freedom-to-operate (FTO) assessments, competitive intelligence and strategic intellectual property (IP) planning.<\/p>\n<p>Read more about how GENESEQ can <a href=\"https:\/\/clarivate.com\/intellectual-property\/wp-content\/uploads\/sites\/5\/dlm_uploads\/2024\/08\/GENESEQ-Factsheet.pdf\" target=\"_blank\" rel=\"noopener\">strengthen your IP strategy<\/a> in our factsheet.<\/p>\n<h3><strong>What is a megapatent?<\/strong><\/h3>\n<p>Before the mid-2000s, <a href=\"https:\/\/clarivate.com\/intellectual-property\/derwent\/geneseq\/\">biological sequence data<\/a> in patents, when present, were typically embedded within the patent specification. The GENESEQ editorial team manually extracted and annotated these sequences to ensure accuracy and completeness. Formal sequence listings formed part of the patent specification, and the sequence data was manually captured by the GENESEQ team of data capture analysts.<\/p>\n<p>Around that time, the World Intellectual Property Office (WIPO) began publishing electronic sequence listings, a format that has since been gradually adopted by other patent authorities. These listings are intended to capture all sequences referenced in a patent, whether in the claims, examples or disclosure sections. However, in practice, they are often incomplete or entirely absent. As a result, manual curation is still required to ensure a full and accurate representation of the sequence content.<\/p>\n<p>This challenge is amplified by the emergence of so-called megapatents \u2014\u00a0<strong>single patent filings that disclose vast numbers of genetic sequences<\/strong>. These patents typically:<\/p>\n<ul>\n<li><strong>Cover many variants of DNA or RNA sequence<\/strong>, that are similar in structure or function.<\/li>\n<li>Use\u00a0<strong>percent identity thresholds<\/strong>\u00a0(e.g., 80%, 90%, or 95% similarity) to extend the scope of protection to sequences that are not explicitly described in the patent.<\/li>\n<li>Are often\u00a0<strong>filed early<\/strong>\u00a0in the discovery process, before the full function or utility of the sequences is known.<\/li>\n<li>Seek<strong> broad control<\/strong>\u00a0over a genetic domain, potentially blocking others from using a wide array of related sequences.<\/li>\n<\/ul>\n<p>This practice has raised concerns among scientists and legal experts alike. Megapatents can <strong>create legal uncertainty<\/strong>, <strong>stifle innovation<\/strong> and <strong>limit access to foundational genetic information<\/strong>, especially when the underlying data is incomplete or poorly annotated.<\/p>\n<h3><strong>Megapatents and GENESEQ: Why manual work matters<\/strong><\/h3>\n<p>When we talk about a \u2018vast number\u2019 of sequences disclosed by a single patent, we mean it. Take, for example, WO2025059390A2, which discloses upward of 34,000 sequence records. Yet, the formal sequence listing comprises only 2,757. The remaining sequences were manually captured by the GENESEQ editorial team \u2013 real experts, not algorithms \u2013 ensuring that every relevant sequence was accounted for and searchable within <a href=\"https:\/\/clarivate.com\/intellectual-property\/derwent\/sequence-search\/\">Derwent SequenceBase<\/a>.<\/p>\n<p>While electronic sequence data\u00a0\u2014 digitally encoded representations of DNA, RNA or protein sequences \u2014 plays a vital role in bioinformatics, it often lacks the legal and scientific context needed for accurate patent analysis. This is especially problematic when dealing with megapatents. Relying solely on electronic listings introduces several critical risks:<\/p>\n<ol>\n<li>Misinterpretation of legal scope<\/li>\n<\/ol>\n<p>Electronic sequence data typically includes only the raw nucleotide or amino acid sequences. However,\u00a0patent claims\u00a0define the legal boundaries of protection, which may:<\/p>\n<ul>\n<li>Include\u00a0percentage identity thresholds\u00a0(e.g., sequences with \u226590% identity to a reference sequence)<\/li>\n<li>Be limited to\u00a0specific uses, organisms, or structural contexts.<\/li>\n<li>Be subject to\u00a0exceptions or disclaimers\u00a0not visible in the sequence data.<\/li>\n<\/ul>\n<p>Without reading the full patent text, it\u2019s easy to misjudge what is and isn\u2019t protected.<\/p>\n<ol start=\"2\">\n<li>Overlooking functional or structural limitations<\/li>\n<\/ol>\n<p>There may be important information regarding the sequences within the patent specification that isn&#8217;t present in the sequence listing, though this isn\u2019t always the case.<\/p>\n<p>Megapatents often claim sequences\u00a0in combination with specific functions\u00a0(e.g., encoding a therapeutic protein) or\u00a0structural features\u00a0(e.g., motifs, domains). These details are typically found in the patent specification, not the sequence listing. Ignoring them can lead to incorrect assumptions about infringement or freedom to operate (FTO).<\/p>\n<ol start=\"3\">\n<li>Inaccurate FTO assessments<\/li>\n<\/ol>\n<p>A sequence might appear unclaimed in the listing, but the patent could cover a broader class of sequences that includes it. Conversely, a sequence might seem protected, but the claim could be narrower than it appears, or the patent might have expired or be invalid. Without full context, FTO assessments can be dangerously flawed.<\/p>\n<ol start=\"4\">\n<li>Lack of contextual metadata<\/li>\n<\/ol>\n<p>While formats like WIPO ST.26 include basic metadata (e.g., organism, function), they don\u2019t replace the rich legal and scientific context found in the full patent document. That context is essential for accurate interpretation and strategic decision-making.<\/p>\n<h3><strong>How do other sequence search tools treat megapatents?<\/strong><\/h3>\n<p>Most sequence search tools rely exclusively on electronic sequence listings, automated data that often omits critical context, especially in megapatents. This creates a significant blind spot for organizations that depend on accurate, comprehensive sequence intelligence for IP strategy.<\/p>\n<p>By contrast, GENESEQ combines manual sequence capture with expert annotation, ensuring that even sequences buried in figures, tables or unstructured text are included and searchable. Moreover, superior search capabilities available via Derwent SequenceBase minimize any risk associated with only viewing the electronic sequence listing data and provide a more reliable foundation for decision-making.<em>\u00a0<\/em><\/p>\n<h3><strong>Manual vs. electronic sequence capture: A comparison<\/strong><\/h3>\n<table class=\"table table-striped\">\n<tbody>\n<tr>\n<td width=\"138\"><strong>Aspect<\/strong><\/td>\n<td width=\"263\"><strong>Electronic sequence capture<\/strong><\/td>\n<td width=\"200\"><strong>Manual sequence capture<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"138\"><strong>Data source<\/strong><\/td>\n<td width=\"263\">Automated extraction (OCR, etc)<\/td>\n<td width=\"200\">Human curation<\/td>\n<\/tr>\n<tr>\n<td width=\"138\"><strong>Accuracy<\/strong><\/td>\n<td width=\"263\">May miss nuances, misinterpret<\/td>\n<td width=\"200\">Very high, extremely nuanced<\/td>\n<\/tr>\n<tr>\n<td width=\"138\"><strong>Contextual info<\/strong><\/td>\n<td width=\"263\">Limited to metadata only<\/td>\n<td width=\"200\">Very rich, includes claims<\/td>\n<\/tr>\n<tr>\n<td width=\"138\"><strong>Legal interpretation<\/strong><\/td>\n<td width=\"263\">Very limited<\/td>\n<td width=\"200\">Detailed and comprehensive<\/td>\n<\/tr>\n<tr>\n<td width=\"138\"><strong>Human expertise<\/strong><\/td>\n<td width=\"263\">None<\/td>\n<td width=\"200\">Expert analysis<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<h3><strong>Conclusion: In the age of megapatents, precision is power<\/strong><\/h3>\n<p>As biological sequence data grows in volume and complexity, the risks of relying solely on electronic listings become harder to ignore. Megapatents expose the limitations of automated tools and highlight the need for curated, context-rich data that supports confident decision-making.<\/p>\n<p>With GENESEQ, you\u2019re not just searching sequences; you\u2019re uncovering the full legal and scientific picture. Backed by expert curation and integrated into Derwent SequenceBase, GENESEQ empowers IP professionals, researchers and legal teams to navigate the genomic IP landscape with clarity and confidence.<\/p>\n<p><strong>Ready to see the difference <a href=\"https:\/\/clarivate.com\/intellectual-property\/derwent\/geneseq\/\">GENESEQ<\/a> can make?<\/strong>\u00a0Explore the product or contact our team to learn more.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Discover how GENESEQ\u2019s expert-curated data helps overcome blind spots in megapatent research for accurate IP strategy, FTO and patent analysis.<\/p>\n","protected":false},"author":59,"featured_media":286866,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2255,42],"tags":[22,522],"class_list":["post-286865","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-intellectual-property","category-patents","tag-derwent","tag-geneseq","clarivate-industry-ip"],"acf":[],"lang":"en","translations":{"en":286865,"ja":286979},"publishpress_future_workflow_manual_trigger":{"enabledWorkflows":[]},"pll_sync_post":[],"_links":{"self":[{"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/posts\/286865","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/users\/59"}],"replies":[{"embeddable":true,"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/comments?post=286865"}],"version-history":[{"count":10,"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/posts\/286865\/revisions"}],"predecessor-version":[{"id":287965,"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/posts\/286865\/revisions\/287965"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/media\/286866"}],"wp:attachment":[{"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/media?parent=286865"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/categories?post=286865"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/clarivate.com\/intellectual-property\/wp-json\/wp\/v2\/tags?post=286865"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}