Clinical Data Platform
Collecting data from various EHR vendors in formats such as HL7, C-CDA, and EDI on a large scale and building a data warehouse following the Medallion Architecture.
Client Information
- Client: Preveta
- Role: Staff Data Engineer
- Year: 2023
What is Preveta?
Preveta is a SaaS care navigation platform specializing in specialty and oncology care. It integrates electronic health record (EHR) data into disease-specific pathways to enhance care navigation and improve patient outcomes through data-driven clinical pathways. Founded in 2018, Preveta aims to empower healthcare providers with better data and insights.


Background
What did Preveta need?
Preveta needed to speed up data ingestion to onboard clients who were on hold because of data engineering delays. The existing system couldn't efficiently process the various data formats coming from different EHR vendors, creating a bottleneck in client onboarding and limiting the platform's scalability.
My contribution
Led the creation of an end-to-end data pipeline, from data collection to reporting, using the medallion architecture. The technology stack involved:
- Azure Data Pipeline: For orchestrating data flows and transformations
- Databricks: For processing and transforming clinical data at scale
- Azure Synapse: For analytics and data warehousing
- SQL Server: For structured data storage and querying
- PowerBI: For visualization and reporting dashboards
Key Achievements
- Designed a scalable architecture that could process multiple healthcare data formats (HL7, C-CDA, EDI)
- Implemented robust data quality checks to ensure data integrity across the pipeline
- Created standardized clinical data models aligned with healthcare industry standards
- Built automated reporting capabilities that provided actionable insights to care teams
- Reduced client onboarding time from weeks to days by automating data ingestion processes
Technical Implementation
The solution followed the Medallion Architecture pattern, which organizes data processing into three layers:
- Bronze Layer: Raw data from various sources is ingested in its original format
- Silver Layer: Data is cleansed, transformed, and standardized into a common schema
- Gold Layer: Business-ready datasets are created to support specific use cases and analytics
This approach allowed for:
- Clear separation of concerns across the data lifecycle
- Improved data lineage tracking and quality management
- Scalable processing of diverse healthcare data formats
- Enhanced data governance and compliance capabilities