Product logins

Find logins to all Clarivate products below.


chevron_left

Machine curation in action: a new approach to essential tremor epidemiology

Machine curation in action: a new approach to essential tremor epidemiology

The advent of artificial intelligence (AI) has revolutionized the way we approach rare disease epidemiology. Through machine learning (ML) and deep learning (DL), researchers can efficiently sift through vast literature to extract critical data on epidemiology trends and treatment efficacy. Essential tremor (ET) is a common neurological disorder characterized by involuntary, rhythmic shaking, primarily affecting the hands, but it can also involve the head, voice, and other body parts. The pathophysiology of ET involves abnormal cerebellum functioning, with genetic factors and familial patterns noted in many cases. This commonly prevalent movement disorder affects about 1% of the general population, yet is often undiagnosed, misdiagnosed, and undertreated, posing challenges for individuals in the workforce.

We used ET as an example of how AI can potentially transform our understanding and management of this condition. This paper will explore the multifaceted benefits of AI in curating and analyzing disease data, highlighting how these technologies not only enhance accuracy and speed (section 1) but can also identify patterns for early diagnosis and assess disease severity based on patient characteristics (section 2). As we delve into the case of ET, we uncover the potential of AI to bring clarity to the complexities of this disease, ultimately contributing to better patient outcomes.

A new AI tool for rare disease data curation

An estimated 400 million people are living with a rare disease worldwide. Within Clarivate Epidemiology, the Incidence & Prevalence Database (IPD) and Epidemiology teams undertake targeted literature searches to extract epidemiological data for these uncommon conditions. However, sparse data is available for many of the rare conditions known to us, partly due to the current limitations of manual curation processes. Manual curation is challenging because it is time-consuming, requires specialized knowledge, and integrates data from various formats like medical records and research articles. AI can overcome these challenges by providing scalable, efficient, and reliable solutions. These tools can quickly and accurately identify patterns and correlations that might be missed manually, ensuring consistency and speeding up research. AI tools like self-supervised image search for histology (SISH), DL algorithms, and ML models can analyze complex data from various sources, identify patterns, and improve diagnostic accuracy in rare diseases.

A 2023 study by W.Z. Kariampuzha et al presents EpiPipeline4RD, a pipeline developed to automate the extraction of epidata from rare disease literature. The pipeline demonstrates high precision and recall scores for epidata extraction, showing comparable results to Orphanet‘s collection model. The benefits are manifold: improved accuracy in disease prediction, faster data processing, and the ability to manage complex multi-dimensional data. The development of this pipeline aligns with the United Nations’ resolution to improve data collection on rare conditions by reducing reliance on manual processes. Case studies include Rett syndrome, eosinophilic esophagitis, classic homocystinuria, GRACILE syndrome, and phenylketonuria. This tool is one of several that we at Clarivate are exploring for bringing to the pharma industry more comprehensive and accurate epidemiology research findings and data. The pipeline is aimed at strengthening the NIH Genetic and Rare Diseases Information Center (GARD). Future developments may include collaborations or integrations with other existing rare disease databases or platforms to enhance data accessibility and utility.

The data science behind the new platform

Key components of the EpiPipeline4RD pipeline include a new epidemiology dataset for Named Entity Recognition (NER); a DL framework (BioBERT) fine-tuned for epidata extraction; and a web interface and Restful API to automate the extraction process. The pipeline integrates ES_Predict for identifying epidemiologic studies. It uses both the European Bioinformatics Institute (EBI) and National Center for Biotechnology Information (NCBI) APIs for PubMed article retrieval, with strict filtering applied to reduce false positives. The full implementation, code, and supplemental data are available on GitHub, and the fine-tuned model and dataset can be downloaded from Hugging Face.

How authentic is it?

David Lapidus, while discussing the strengths and limitations of this new AI tool for rare disease epidemiology, argued that while the AI tool offers a significant advancement over Orphanet by quickly identifying and summarizing prevalence studies, it only analyzes PubMed abstracts, missing key information that may be in the full text of articles, thereby affecting data interpretation. In practice, inconsistencies and missing data may occur (e.g., the AI tool did not pick up Baujat’s 2017 study for fibrodysplasia ossificans progressive [FOP], considered a gold standard in the field, as well as missing actual prevalence studies for autoimmune pulmonary alveolar proteinosis [aPAP]).

Understanding essential tremor

Essential tremor (ET), a prevalent yet often overlooked condition, significantly impacts daily activities and quality of life, with estimates varying across studies. About 1% of the world population is affected by ET, translating to around 24.91 million people in 2020. ET impacted 7 million individuals in the U.S. alone, with 1 million diagnosed and seeking treatment during 2015-2019. Clarivate’s proprietary Real World Data (RWD) asset estimated over 410,000 diagnosed cases of ET in the U.S. in 2023.

The prevalence of ET increases with age, with estimates ranging from 0.04% in individuals younger than 20 to 2.87% in those aged 80 years and older. Clarivate’s IPD data indicate female gender, absence of family history, and rest tremor at baseline as predictive factors for worse disease progression.

Patients often experience comorbidities, impacting their ability to perform optimally in professional settings. Since ET is associated with multiple conditions, an assessment of top comorbidities identified essential hypertension in about 69% of ET cases, followed by hyperlipidemia in nearly 50% of ET cases. Other commonly reported comorbidities include GERD without esophagitis, type 2 diabetes, and anxiety disorders.

Diagnosis aided by machine learning (ML)

Currently, ET is diagnosed based on symptoms and a neurological examination. However, ML can help bridge the gap in diagnosis by analyzing patient data to detect probable cases of ET. These probable cases can then be validated using the literature. A recent study used grey matter morphological networks and ML models to distinguish between ET, dystonic tremor, and healthy controls. The study found that 16 morphological relation features and 1 global topological metric could function as discriminative features. Thus, ML models, trained to analyze patient data and detect probable ET cases, can flag at-risk patients, allowing for further investigation and potential early intervention. The above ML study, designed to detect probable ET cases, found that the Random Forest classifier achieved the best classification performance in the three-classification task, achieving a mean accuracy of 78.7%.

Twin studies indicate a genetic influence on ET, with potential loci and gene polymorphisms associated with the condition. The interaction between genetic predispositions and environmental triggers contributes to the development and progression of ET. This genetic basis can be utilized for the early diagnosis of ET. By analyzing historical patient data, ML models can learn to recognize the early signs of essential tremor. A study published in Open Life Sciences used ML algorithms to screen miRNAs and identify possible biomarkers for ET. The study used public datasets and their own datasets to examine the disorder. The results of the study suggested that three differentially expressed genes (APOE, SENP6, and ZNF148) may successfully differentiate between samples from ET patients and normal controls.

Severity assessment by intelligent devices

The severity of ET is traditionally assessed using clinical observation and rating scales such as the Fahn–Tolosa–Marin Tremor Rating Scale (FTM-TRS) and the Movement Disorder Society-sponsored revision of the Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) Part III. However, ML can provide a more objective and quantitative assessment. Accurate clinical assessment tools and technology like smartphones and smartwatches can be used in tremor measurement. A study published in BMC Bioinformatics used smartphones strapped to the wrists of ET patients to collect sensor data. This data was then analyzed using a fuzzy model, which improved the mean absolute error metric by 78–81% compared to linear models and by 71–74% compared to a model based on decision trees. A comprehensive review published in the Journal of Neurology discussed the use of intelligent devices for assessing ET. The devices used included inertial measurement units, electromyography, video equipment, and electronic handwriting boards. The data collected from these devices was analyzed using ML models to identify patterns and characteristics unique to ET patients.

AI’s potential for improving understanding of tremors

Advances in tremor research over the past decade include a new classification system introduced by the International Parkinson’s and Movement Disorders Society (IPMDS) in 2018. Like dystonic tremor (DT) and Parkinsonian tremor, ET is now seen as a syndrome rather than a single disease. ET is associated with isolated bi-brachial action tremor of at least 3 years duration, with “indeterminate tremor” for shorter durations and “ET-plus” for ET with additional signs (such as rest tremor). This classification implies that one syndrome might evolve into another over time, showcasing the dynamic nature of tremor syndromes. Tremor pathophysiology involves the cerebello-thalamo-cortical circuit, and different tremor types might relate to different pathophysiologic mechanisms.

The integration of AI in market analysis for ET has shown significant potential. Transducers and technology like smartphones and smartwatches are being used in tremor measurement, improving the precision of tremor assessments. AI’s ability to process vast amounts of data rapidly and accurately, plus identifying patterns and trends (that may be overlooked by traditional analysis methods), allows for a more nuanced understanding of the market landscape. However, AI applications for rare diseases also come with drawbacks. The need for large, high-quality datasets to function effectively can be a challenge given the rarity of these diseases. Additionally, the complexity of these tools can make them difficult to use and interpret without specialized knowledge. Lastly, ethical considerations arise around data privacy and the potential misuse of sensitive health information. Despite these challenges, the potential of AI in disease epidemiology is undeniable and continues to drive advancements in the field.

At Clarivate, we triangulate various data sources to generate authentic epidemiology information. Whether it is scholarly papers or extensive real-world evidence (RWE), cross-verifying data using different Clarivate datasets (e.g., Epi Core datasets, the Incidence and Prevalence Database [IPD], and RWD from Epidemiology Intelligence) enhances our understanding of disease pathology. As seen in the case study of Lyme disease, our in-house RWD confirmed high incidence rates in the U.S., and that current treatment options are underutilized (antibiotics and the anti-inflammatory drug Dapsone). While antibiotics remain the standard of care for Lyme patients, RWE noted that only 51% of antibiotic-treated cases report some improvement after a year; an alarming 37% of patients complain that their condition was unchanged, while 12% reported their condition as worse. Thus, separate disease data sets are needed to complement each other and/or provide the complete epidemiological picture.

In this context, we exercise caution when integrating AI tools such as the EpiPipeline4RD pipeline for curating epidemiological data from online sources. This AI tool analyzes PubMed abstracts alone; thus, inconsistencies may occur. While such AI tools can be used as a first step in data curation, they require follow-up with manual or partly automated processes to ensure completeness and accuracy of the curated data. This strategy is similar to what we currently practice with Orphanet and GARD.

Connect with an expert to learn more about DRG Epidemiology Intelligence.

Swarali Tadwalkar contributed to this article

Related insights

The latest news, technologies, and resources from our team.

How creative design can elevate your healthcare value communication tools How creative design can elevate your healthcare value communication tools
Blog September 12, 2025
How creative design can elevate your healthcare value communication tools
Streamlining regulatory compliance with AI-enabled intelligence Streamlining regulatory compliance with AI-enabled intelligence
Blog September 2, 2025
Streamlining regulatory compliance with AI-enabled intelligence
5 key time-saving benefits of AI-driven regulatory intelligence 5 key time-saving benefits of AI-driven regulatory intelligence
Blog September 10, 2025
5 key time-saving benefits of AI-driven regulatory intelligence
chevron_left
chevron_right