How Great Expectations Is Turning Healthcare Data into a Reliable Clinical Asset

Built by healthcare data practitioners solving real big data problems inside real healthcare systems

By Joshua Heath

Modern healthcare runs on data. In fact, Healthcare generates roughly 30% of all the data in the world. Every lab result, clinical note, insurance claim, AI model, and population-health insight is powered by information flowing through massive digital systems. Yet most of that data still moves through pipelines that were never designed to ensure accuracy, consistency, or reliability. Healthcare IT Today recently reported that providers continue to face major challenges integrating data from EHRs, wearables, and patient-generated sources due to disparate formats and uneven quality, hindering clinicians’ ability to form a complete, trusted view of patient health.

Hospitals, payers, and digital health companies now depend on thousands of data sources. Electronic health records, legacy databases, lab systems, connected devices, claims platforms, and third-party vendors all feed into centralized data lakes that power everything from clinical decision-support tools to AI-driven care models. 

When that data is wrong, inconsistent, duplicated, or misshapen, AI does not fail loudly. It fails quietly, producing confident answers that can be dangerous – leading to missed diagnoses, incorrect risk scores, delayed treatments, and flawed clinical decisions. For instance, in the UK, an AI-generated medical summary wrongly recorded that a healthy man had diabetes and suspected heart disease, even inventing a hospital address, which led to him being incorrectly invited to a diabetic screening. 

This is the invisible crisis facing healthcare today. The industry has invested billions in digital transformation, but far less in ensuring the data moving through those systems can actually be trusted.

A National Library of Medicine study found that poor data quality and information overload increase clinician error rates and threaten patient safety. Federal EHR surveillance has also identified widespread system issues that could cause patient harm.

As Hernan Alvarez, CEO of Great Expectations, puts it:

“Data comes from many places and moves through complex pipelines and transformations before it ever reaches analytics, AI, or decision systems. But most organizations do not actually know the quality of their data as it moves through those systems. They do not know whether it is getting better or worse,” says Hernán Álvarez, CEO of Great Expectations.

That gap between what healthcare assumes its data is and what it actually is has become one of the industry’s most dangerous vulnerabilities.

“Healthcare is becoming a data-driven, AI-enabled industry, but none of that works if the data itself is unreliable,” said Austin Walters, Partner at SpringTide Ventures. “Great Expectations is building the trust layer that allows modern healthcare systems to function safely at scale. They are not just improving analytics. They are protecting patient outcomes.”

The gap first emerged in 2017, when Abe Gong was operating a healthcare data consultancy called Superconductive Health and working inside hospital systems, migrating records, modernizing platforms, and building new data pipelines. As the work expanded, James Campbell joined the company and later became a cofounder during the pivot from consulting to building GX in late 2019. In hospital after hospital, one issue kept surfacing: the data was moving, but no one could prove it was still accurate.

Patient identifiers drifted. Lab values changed formats. Fields broke during transformation. Yet those same datasets still powered dashboards, analytics, and, increasingly, AI models that influence real decisions about diagnosis, treatment, and care.

So they built a solution.

Using open-source Python tools and modern data engineering frameworks such as Airflow, the team created a library to test data as it moved through pipelines, ensuring it remained consistent with its upstream state before downstream use. They called it Great Expectations.

When they released it as an open source framework for data quality, adoption took off. Data teams across healthcare and other regulated industries recognized it immediately as a missing piece of the modern data stack.

Today, Great Expectations exists because healthcare has become software-driven, but trust in the underlying data has not kept pace.

The Great Expectations Solution

Today, Great Expectations powers a full enterprise SaaS platform, GX Cloud, that allows healthcare organizations to validate their data at every step of its journey, from ingestion through transformation to analytics and AI.

Instead of discovering problems after clinicians, analysts, or AI systems have already used bad data, Great Expectations catches issues in real time. Data can be automatically tested to ensure that patient identifiers are in the correct format, lab values fall within expected ranges, dates and codes are valid, and records comply with required clinical and regulatory standards. When something goes wrong, teams are alerted immediately with clear, actionable insight into what failed and why.

Just as importantly, Great Expectations makes data quality visible and collaborative. Clinicians, analysts, and compliance teams can see which datasets have been tested, when they were validated, and whether they are safe to use. This creates a shared foundation of trust across organizations where data is no longer a black box but a documented, governed asset.

In a healthcare system increasingly dependent on AI, analytics, and digital workflows, Great Expectations has become the quality control layer that ensures technology decisions are grounded in reliable data.

Real-World Impact Across Healthcare

For healthcare organizations, the impact of Great Expectations is already tangible.

In Europe, LOGEX, a Netherlands-based healthcare technology company, uses GX to validate the hospital data powering its Healthcare Intelligence Suite. Operating across 10 countries, LOGEX supports clinicians and decision-makers with analytics that influence regulatory compliance, treatment evaluation, and financial planning. But hospital data often arrives incomplete, inconsistent, or misaligned with local standards.

To address this, LOGEX built a backend application that integrates directly with GX Core. Data quality issues are flagged before entering LOGEX’s processing pipeline, allowing analysts and customers to correct errors early. GX’s flexible validation framework enables LOGEX to adapt quality checks across different countries, formats, and languages while maintaining consistent standards.

“Addressing these data quality issues is essential before the data can proceed through LOGEX’s processing pipeline, ensuring that downstream insights and decisions are based on reliable information,” said Maria Zilli, Data Engineer at LOGEX.

By catching errors upstream, LOGEX ensures that the insights clinicians rely on are built on data they can trust.

At the national scale, Komodo Health uses Great Expectations to protect some of the most complex healthcare data pipelines in the country.

Komodo’s Healthcare Map processes more than 15 million clinical encounters per day, tens of millions of claims, and data on over 320 million U.S. patients, powering analytics and alerting systems used by life sciences companies and providers. One of Komodo’s flagship products, Pulse, delivers real-time clinical alerts to identify clinicians treating patients with rare and complex conditions. Because those alerts guide engagement around oncology and specialty therapies, accuracy is critical. Incorrect data can trigger misleading or unnecessary alerts, undermining trust and affecting care decisions.

From Open Source to Industry Standard

Great Expectations’ role in healthcare has driven extraordinary adoption.

The open-source framework is now used by more than 14,000 organizations, supported by a global community of over 13,000 data practitioners, and has more than 24 million downloads per month. That scale reflects a simple truth across healthcare and other regulated industries: data quality is no longer optional.

That open-source momentum has translated into a fast-growing commercial business. GX Cloud gives enterprises a secure, scalable way to enforce data quality across mission-critical pipelines, from clinical analytics to AI training. The company has raised more than $60 million in venture capital, including a $21 million Series A and a $40 million Series B, backing its push to become the system of record for data trust.

“Healthcare forced us to build for the hardest possible conditions,” Alvarez said. “You have regulation, clinical risk, legacy systems, and enormous data volumes. Once we solved for that, it became clear that the same platform could support financial services, insurance, and any enterprise running AI or analytics at scale. That is why organizations like Heineken, Vimeo, and Provectus now rely on Great Expectations as well.”

Next
Next

How Health Insurers Are Embracing Technology to Serve Real People