Methodology and intuition behind the browser-based SEIRV epidemic simulator on this site. Poisson tau-leaping, state-level coverage, and why the ribbon matters more than any single run.
Zaher Karp
Healthcare data engineering and Medicare Advantage analytics.
About
I started in writing and editing, spent nearly a decade in mixed-methods health services research at UW–Madison beginning in 2009, and moved into data engineering when I realized the tools I needed to study complex healthcare systems didn't exist yet. The methodological thread runs from grounded theoryGrounded theory, developed by Glaser and Strauss in 1967, builds theory inductively from coded qualitative data rather than testing hypotheses against it. In practice this means reading transcripts, labeling passages, comparing labels across transcripts, and letting structure emerge. in qualitative research, through logistic and linear regression in population health, through time series analysis in Stars forecasting. Different instruments, the same underlying question about how complex systems behave under measurement.
Right now that means Medicare Star Ratings, HEDIS pipelines, and data integration across claims and eligibility sources.Current role: Lead Data Engineer at Baltimore Health Analytics, Madison WI. Managing US-based engineering with work spanning analytics methodology, code review, and CMS-to-roadmap translation. I am leading a data function and moving deliberately into organizational leadership, because the decisions I want to influence happen above the individual contributor level.
Writing
24 weeks of activity 23 postsWriting resumed in late 2025 after a multi-year pause. The early weeks of the window are sparse by design, not by neglect; recent weeks show a steady run.
A personal history of building and rebuilding a website across three eras, and what the increasing friction of each step actually taught me.
A short explainer for the Stars Cliff Simulator: why ordinal logistic regression is the right tool for a 1 to 5 star outcome, and how P(clearing 4.0 stars) falls out of the model for free.
The statistical methodology and published literature behind the Stars Cliff Simulator, a public teaching-oriented tool focused on the 4.0 star Quality Bonus Payment threshold.
A practitioner's walkthrough of how HEDIS measure pipelines actually work: eligible population, numerator matching, exclusions, supplemental data, and rate calculation.
Most applications boil down to server-rendered or API-driven designs. The difference isn't just technical. It shapes how your system evolves.
Experience
Lead Data Engineer
I lead the data engineering and analytics methodology function for a Medicare Advantage quality platform. The role spans direct management, code review across the engineering team, and coordination of data science and QA across two geographies, plus the product roadmap and translation between CMS requirements and organizational priorities.
More detail
CMS Star Rating methodology is published in Technical Notes that run to hundreds of pages: regulatory language describing measure construction, denominator logic, significance testing, and improvement score calculation.The CMS Technical Notes are revised each rating year. A single measure rewrite can propagate into denominator changes, hold-harmless logic changes, and cutpoint regeneration, all of which need to round-trip through the analytics layer without drift. The work is translating that regulatory text into executable Python and SQL, using pandas for data transformation, scipy and numpy for statistical components including robust exponential smoothing for time series forecasting, and Selenium for web automation, producing auditable outputs where every threshold and weighted average is traceable to its source.
HEDIS hybrid measures add a separate class of problem. The denominator is constructed from claims and eligibility data arriving from multiple health plans in inconsistent formats. Normalizing that into a unified analytic layer while preserving the audit trail requires treating every source format as a suspected deviation from the specification until proven otherwise.
pandas · scipy · numpy · dbt · SQL · Selenium · Python
Healthcare Analytics Manager, Embedded Refills and Care Gaps
In 2020 Health Catalyst acquired healthfinch, and I migrated with the product. The three-person analytics function became a clinical quality platform serving health systems across Epic, Cerner, athenahealth, and Veradigm.Four EHRs with different data models, different extract behaviours, and different interpretations of the same clinical concept. A medication adherence rate is not portable across these systems until someone sits down and defines it in a source-specific way. RxNorm validation across the platform cut client-audit discrepancies from roughly 30% to under 5%. The acquisition also meant migrating infrastructure from AWS to Azure and Databricks, an architectural shift that required rebuilding the analytics layer while maintaining continuity of service for existing clients.
More detail
Data governance came down to one question: does a medication adherence rate mean the same thing across Epic, Cerner, athenahealth, and Veradigm. The ELT pipeline I inherited ran for multiple days and produced outputs that were difficult to trace to their source. The redesign was an auditability and governance decision, not a performance one.Medallion architecture: bronze for raw ingestion, silver for documented business logic, gold for analytic output. Every transformation has a name, a location, and a test. Same-day runtime was a side effect of the design, not the goal.
HIPAA compliance was an architectural constraint throughout. Not a certification, but a set of requirements that shaped every decision about data storage, access control, and transmission across a platform handling protected health information at scale.
dbt · Redshift · AWS to Azure · Databricks · Python · Tableau · Power BI
Healthcare Analytics Manager, Specialist
First analytics hire meant the infrastructure did not exist. I built it under HIPAA and HITRUST compliance requirements on AWS, which meant that every data governance decision, from access control to audit logging to retention policy, was mine to make without a prior framework to inherit. Promoted to manager after one year to lead cross-functional work across product, engineering, and customer success.
More detail
ROI modeling was the commercially critical work: linear regression on clinical workflow data, translated into client-facing reports that supported over a million dollars in recurring revenue. Built dashboards that drove sevenfold growth in internal user adoption and eliminated four hundred hours of annual manual reporting preparation.
SQL · Python · Sisense · AWS · HIPAA · HITRUST
Researcher, Five Roles Over Nine Years
Nine years embedded in federally-funded primary care research, advancing through five positions from research specialist to assistant researcher. I built the technical infrastructure for studies funded by the National Institute on Aging, the Wisconsin Partnership Program, the Josiah Macy Jr. Foundation, and multiple UW Institute for Clinical and Translational Research awards.
More detail
The Wisconsin Longitudinal Study, a fifty-year cohort of ten thousand adults integrating survey, health, and administrative records, taught me what longitudinal data quality problems actually look like: cohort attrition, measurement drift, linkage failure across administrative sources that were never designed to be linked.Methodological training included Contemporary Qualitative Interviewing Methods at the University of Oxford (2014).
A sustained research thread on primary care redesign used grounded theory for qualitative analysis of field notes and focus group transcripts, and interrupted time series analysis to measure the effect of care delivery change initiatives on clinic panel data.
The ACO cost research integrated EMR, claims, and patient satisfaction data in Stata and SAS to produce cohort analyses showing that higher-baseline-cost organizations were more likely to achieve shared savings, published in the International Journal of Healthcare Management in 2018.
Stata · SAS · NVivo · SPSS · REDCap
Principal
Editorial services practice specializing in environmental, health, and policy content. Managed up to eight copy editors, graphic designers, and photographers. Wrote articles syndicated through Thomson Reuters, LexisNexis, and the New York Times wire. Edited and indexed client manuscripts published as books, peer-reviewed journals, grants, and dissertations.
Projects
Client-Side Stars Rating Predictor
A cut-point dashboard built at Baltimore Health Analytics for internal Stars forecasting across our client contracts. The design constraint, no member-level data leaves the analyst's machine, was a compliance requirement, not an aesthetic one.Running the model client-side in the browser keeps PHI on the analyst's machine and sidesteps an entire class of data-transit and data-residency concerns that would otherwise need a server-side review. It runs ordinal logistic regression on live measure feeds entirely in the browser, projects cut-point crossings at the contract level, and surfaces which measures are closest to their next tier for remediation planning. Source is private.
Ordinal logistic regression · cut-point projection · client-side · internal tool
Stars Cliff Simulator
A public, teaching-oriented companion to the internal predictor. Single-page interactive demo built around one number, the 4.0 star QBP cliff that separates Medicare Advantage plans that qualify for Quality Bonus Payments from the 3.5 to 3.99 star "dead zone" that does not.For a mid-size MA contract, clearing 4.0 is worth roughly $50M relative to a 3.5 star rating. A tenth of a star literally changes the plan's financial structure. An ordinal logistic regression calibrated to CMS 2025 weights runs in the browser; four sliders collapse the 42-measure surface down to its highest-leverage inputs, and a cut-point visualization exposes the mechanism that makes a tenth of a star worth $50 million. No data leaves the user's machine.
Ordinal logistic regression · vanilla JS · no dependencies · client-side
Healthcare Workforce Transition Platform
The original question was about healthcare workforce shortages, which adjacent roles can clinical and administrative staff reskill into. O*NET occupation and skill data provided the structure; logistic regression calibrated the transition probability estimates. Produces Ready Now, Trainable, and Long-Term Reskill recommendations with gap analysis by skill domain.
FastAPI · PostgreSQL · scikit-learn · logistic regression · O*NET
Medicare Advantage Insight Engine
A local news monitor that fetches public Medicare Advantage sources, scores items for analytic relevance using keyword and domain heuristics, and posts structured alerts to a Teams-compatible webhook. Distinguishing a CMS rulemaking notice from a press release that mentions Medicare Advantage requires knowing what questions a Stars analyst is actually trying to answer.
Python · automation · webhook
ECDS Shock Index
ECDS adoption introduces a distributional shift into the Stars ecosystem that most health plans are not yet modeling.ECDS, Electronic Clinical Data Systems, is the HEDIS reporting method that allows structured clinical data to supplement or replace claims-based measure reporting. This repository implements the shock index methodology, quantifying the expected change in measure rate distributions when ECDS replaces legacy ED visit coding, and estimating the downstream effect on Stars cutpoint crossings at the plan level.
Python · Medicare Advantage · Stars methodology
Care Delivery Workflow Changes
Analyzed organization-wide care delivery changes using interrupted time series analysis on clinic panel data. The methodological question was whether observed changes in care patterns were attributable to redesign initiatives or to secular trends, which requires a study design that can separate the two. The design combined segmentation with regression across multiple clinic sites to isolate the redesign effect from background drift.
Stata · SAS · interrupted time series · outpatient analytics
Practice Automation Analytics, healthfinch Charlie
I built the analytics for a case study of healthfinch's Charlie deployment across multiple community health centers at OCHIN. The case study asked two questions: how did clinician workflows change after Charlie deployment, and what was the ROI for the health centers that adopted it? Linear regression on the workflow data produced quantified outcome measures that the commercial team used in renewal conversations and new-customer pitches.
linear regression · Sisense · Epic Clarity · SQL
Publications
2019
Influence of environmental design on team interactions across 3 family medicine clinics
PubMed ·
ResearchGate
11 citations
Karp Z, Kamnetz S, Wietfeldt N, Sinsky C, Molfetter T, Pandhi N.
Health Environments Research & Design Journal 12(4):159-173.
2019
Broadening medical students' exposure to the range of illness experiences: a pilot experimental curriculum trial
PubMed
4 citations
Pandhi N, Gaines ME, Deci D, Schlesinger M, Culp C, Karp Z et al.
Academic Medicine.
2018 Medicare Shared Savings Programs: higher cost accountable care organizations are more likely to achieve savings Published in the International Journal of Healthcare Management, 2018. First large-N analysis of the cost-to-savings relationship in early Medicare Shared Savings Program cohorts.
Berkson S, Davis S, Karp Z, Jaffery J, Flood G, Pandhi N.
International Journal of Healthcare Management.
2016 An efficient process of gathering diverse community opinions to inform an intervention Full text
Pandhi N, Jacobson N, Serrano N, Hernandez A, Zeidler-Schreiter E, Wietfeldt N, Karp Z.
Implementation Science 11(Suppl 1):A13.
2014
Approaches and challenges to optimizing primary care teams' electronic health record usage
PubMed
23 citations
Pandhi N, Yang WL, Karp Z, Young A, Beasley JW, Kraft S, Carayon P.
Journal of Innovation in Health Informatics 21(3):142-51.
2012 Approaches and challenges to optimizing the use of EHRs in primary care (preliminary findings) Full text
Yang W, Pandhi N, Karp Z, Young A, Beasley J, Kraft S, Carayon P.
Proceedings of World Conference on E-Learning, Montréal.
Speaking
Seventeen podium presentations, workshops, and posters at national and regional venues between 2010 and 2017, in healthcare services research and primary care systems engineering.A co-authored poster received the Patient Choice Award (2 of 45) at the North American Primary Care Research Group Conference, 2015.
Full list (17 presentations, 2010 to 2017)
- 2017National Collaborative for Improving Primary Care Through Industrial and Systems EngineeringMadison, WI
- 2017UW Institute for Clinical and Translational Research, Dissemination and Implementation Short CourseMadison, WI
- 2016AAMC Integrating Quality MeetingChicago, IL
- 2016National Collaborative for Improving Primary Care Through Industrial and Systems EngineeringMadison, WI
- 2015Field Innovation Team's BootcampProvo, UT
- 2015Access, Quality, and Outcomes Research NetworkAppleton, WI
- 2015Society for Implementation Research Collaboration ConferenceSeattle, WA
- 2015National Collaborative for Improving Primary Care Through Industrial and Systems EngineeringMadison, WI
- 2015UW Health Quality WeekMadison, WI
- 2015North American Primary Care Research Group ConferenceCancún, Mexico
- 2015Wisconsin Research and Education Network Convocation of PracticesOshkosh, WI
- 2014National Collaborative for Improving Primary Care Through Industrial and Systems EngineeringMadison, WI
- 2014AAMC Integrating Quality MeetingChicago, IL
- 2014Pharmacy Society of Wisconsin Annual MeetingMadison, WI
- 2012World Conference on E-Learning in Corporate, Government, Healthcare, and Higher EducationMontréal, QC
- 2012UW Health organizational leadershipMiddleton, WI
- 2010Wisconsin Primary Care Research and Quality Improvement ForumMiddleton, WI
Testimonials
He consistently pushes himself to deliver thoughtful, high-quality work because he genuinely wants to make a difference, for the team, for the client, and for healthcare and patients.
More
It's rare to come across someone like Zaher, not just for his intelligence, but for the care, curiosity, and sense of responsibility he brings to everything he does. Zaher has a natural ability to think deeply about problems, often catching nuances others miss, and he balances that with a strong commitment to execution. He leads with integrity and consistently aims to do what's right, even when it takes more effort.
Despite being the only engineer on the team, Zaher consistently delivered high-quality work. He is intelligent, thorough, and deeply committed to understanding customer needs.
More
Zaher was solely responsible for ensuring timely, accurate data delivery across multiple Electronic Health Record environments: Cerner, Epic, athenahealth, and Veradigm. He successfully led the migration of analytics from Sisense to Pop Insights, and implemented automated weekly data refreshes for Cerner, significantly improving efficiency and reliability. He often joined client calls to clarify requests and wasn't afraid to push back when necessary to protect data integrity and long-term scalability.
Education
Service and Recognition
Contact
resume_db=# \x Expanded display is on. resume_db=# SELECT * FROM zaher; -[ RECORD 1 ]----------------------------- name | Zaher Karp title | Lead Data Engineer focus | Analytics Engineering · Data Platform domain | Medicare Advantage · HEDIS · CMS Stars stack | SQL · Python · dbt · AWS · Azure · Databricks status | building connections · open to collaboration reads | designing data-intensive applications learn | obsidian resume_db=#