Accelerating innovation

The phenotype cloud

Cluster patients by data first and ask whether the categories fit afterward. What it would take to do for hEDS what molecular classification did for cancer.

The conventional way to study a rare disease starts with a diagnostic category. The hEDS researcher recruits patients who meet the 2017 criteria and looks for shared features within that population. The fibromyalgia researcher recruits patients who meet ACR criteria and looks for shared features within that population. The chronic fatigue syndrome researcher recruits patients who meet International Consensus Criteria and looks for shared features within that population. Each researcher generates a body of evidence within their bucket. The buckets do not communicate.

The biological reality is that these conditions overlap. Joint hypermobility, autonomic dysregulation, mast cell activation, chronic widespread pain, post-exertional malaise, gastrointestinal dysmotility, and a dozen other features cluster across the diagnostic categories. Many people meet criteria for two or three of them. Some meet criteria for none of them despite presenting with a constellation of symptoms that resembles all of them.

The phenotype cloud approach inverts the conventional method. Instead of starting with categories and looking for shared features within each, the cloud approach starts with structured longitudinal symptom data from a large heterogeneous population and lets clusters emerge from the data. The clusters that emerge are biological signals, not diagnostic labels. Some of the clusters correspond to existing categories. Some do not.

What the cloud requires

The phenotype cloud requires three things that the rare disease research infrastructure does not currently have at scale.

The first is structured symptom data. Free-text clinical notes capture rich phenotypic information in a form that resists computational analysis. Standardized symptom ontologies (HPO, SNOMED CT) capture less information but support clustering. Patient-reported symptom data captured through structured instruments (PROMIS, condition-specific scales) captures the daily-life features that clinical notes miss. The cloud requires all three layers.

The second is longitudinal data. A snapshot of symptoms at a single visit cannot distinguish a person whose joint pain is stable from a person whose joint pain is part of a progressive course. Distinguishing those two trajectories requires repeat data over years. The longitudinal dimension is what generates clinically meaningful clusters; cross-sectional data alone produces clusters that mix progressive and stable phenotypes that share a momentary appearance.

The third is sufficient sample size. Phenotypic clustering across the heterogeneity of the hEDS-fibromyalgia-CFS-POTS-MCAS overlap requires data from thousands of patients, not hundreds. The clusters that emerge from a small dataset are unstable and do not replicate. The clusters that emerge from a large dataset are biological signals that hold up under cross-validation.

What the cloud could find

The hypothesis that hEDS is genetically heterogeneous predicts that the hEDS clinical category, when subjected to phenotypic clustering, splits into multiple distinct subgroups with different symptom trajectories, comorbidity patterns, and treatment responses. The same hypothesis applied to fibromyalgia and to chronic fatigue syndrome predicts the same kind of internal heterogeneity within each.

A phenotype cloud built across all of these populations could find subgroups that do not respect the diagnostic boundaries. A subgroup of hEDS patients with autonomic features and mast cell activation might cluster more tightly with a subgroup of CFS patients than with the rest of the hEDS population. The clustering pattern would be a hypothesis about shared biology that subsequent work could test through gene discovery, biomarker validation, and treatment response analysis.

The historical parallel is cancer classification. Tumors used to be classified by organ of origin: lung cancer, breast cancer, colon cancer. The classification system worked clinically for the era because it tracked the tissue that physicians could see and biopsy. As molecular tools improved, organ-based classification became one input among several. Tumors are now classified by HER2 status, BRCA mutation status, microsatellite stability, PD-L1 expression, and a growing list of other molecular features that often matter more than the organ of origin for treatment selection.

The rare disease equivalent of that shift has not happened yet. The diagnostic categories of the 2017 international EDS classification, the ACR fibromyalgia criteria, and the International Consensus Criteria for chronic fatigue syndrome are organ-of-origin equivalents. They classify by the most visible clinical feature. The molecular and longitudinal classification system that may eventually supersede them does not exist because the data required to build it does not exist.

What this means for the diagnostic argument

The political fights over hEDS criteria, Beighton score cutoffs, and which patients belong in which bucket are arguments about the boundary lines of categories. The phenotype cloud reframes the question. The argument is not whose bucket is the right size. The argument is whether the buckets, regardless of their boundaries, are tracking the underlying biology.

The cloud does not delegitimize the existing diagnostic categories. The categories were the best available framework for organizing clinical care in the absence of better data. The cloud does not replace them in the clinic next year. What the cloud does is shift the locus of authority from the criteria committee to the data. The criteria committee establishes which patients enter the dataset; the data, accumulated over years, tells the field whether the criteria committee got the boundary right.

For people who have been on the wrong side of a diagnostic boundary, who meet most criteria for hEDS yet fall short on one, who carry an HSD label rather than an hEDS one, who present with a constellation that resembles every category and slots cleanly into none, the cloud is the route to clinical visibility. The current system requires you to fit a category to be studied. The cloud studies you and then asks whether the category fits.