Projects

Medicare Provider Benchmarking Engine

A healthcare ML benchmarking engine built on 1.21M CMS Medicare provider-service records to estimate expected standardized Medicare amounts, surface over-expected anomaly patterns, and segment oncology providers by anomaly burden. Includes routed XGBoost modeling, guardrails, anomaly worklists, provider tiering, logistic explainability, and a deployed Shiny for Python dashboard.

Built a provider-service expected-cost engine using CMS Medicare data across 21.6K providers, 1,742 HCPCS codes, and 4 years of oncology-focused records
Designed a routed D/G architecture: Model G for lag-available rows and Model D for true cold-start rows without prior history
Added leakage audits, provider-disjoint validation, negative controls, and post-model guardrails to make the scoring workflow more defensible
Produced row-level anomaly worklists, provider-level watchlists, clustering-based provider tiers, and an interactive dashboard with artifact provenance and model documentation

GitHub

Live app

Portfolio, Sector & Event Lab (Pro)

This project shows how I rebuilt a Python Shiny finance analytics app to scale with DuckDB, S3, and query-on-demand design. The new architecture moved beyond an in-memory prototype, making the app faster, more scalable, and better suited for richer multi-year market analysis across portfolio backtests, sector exploration, and event studies.

Rebuilt the app architecture around DuckDB + S3 + Parquet instead of a pandas-first in-memory workflow
Enabled larger-history analysis and more robust event-study workflows without loading the full dataset into memory
Used query-on-demand design to improve app responsiveness and support more realistic data scale
Preserved a streamlined Shiny interface while making the backend more production-ready and scalable

GitHub

Live app

Ames, Iowa Housing ML

A production-ready housing price prediction pipeline for Ames, Iowa, built around a weighted voting ensemble (Lasso, XGBoost, CatBoost) for strong accuracy and robust behavior on edge cases. Includes a Shiny for Python dashboard that simulates deal economics and renovation ROI for investor and house-flipper scenarios.

Weighted voting ensemble (Lasso + XGBoost + CatBoost) with test R² ≈ 0.933
Research-to-production workflow: EDA notebook, production pipeline notebook, deployable artifacts (.pkl)
Stress-tested with extreme “outlier house” scenarios to validate stability and guardrails
Shiny dashboard with investor and flipper modes, including renovation toggles and deal economics views

GitHub

Live app

Portfolio, Sector & Event Lab (Legacy)

An end-to-end US stock market analytics pipeline and Shiny for Python app that combines Yahoo Finance price data with SEC EDGAR fundamentals and filings context. The repo builds a cached master Parquet dataset plus an optimized sample slice powering three modules: Portfolio Backtester, Sector Explorer, and Event Study.

Builds a master Parquet dataset from Yahoo Finance + SEC EDGAR with caching for safe reruns
App includes Portfolio Backtester, Sector Explorer, and Event Study tools
Master dataset includes returns, split history, filing context, and shares/float with backfills
Produces a lightweight outputs/sample.parquet slice for fast app loading (with fallback logic)

GitHub

Live app

Mental Health in Tech: 2014-2023

A reproducible multi-year EDA of OSMI tech worker surveys (2014, 2016–2023) that harmonizes changing survey instruments into a consistent analysis table. Focuses on whether visible workplace support policies correlate with higher treatment-seeking, supported by clear visual storytelling and lightweight statistical modeling.

Harmonizes eight survey waves into a 2014-aligned schema using robust renaming and regex fallbacks
Builds policy “support scores” (0–4 and 0–5) capturing benefits, care options, seek-help resources, anonymity, and wellness visibility
Visual toolkit includes point plots with 95% CI, stacked bars, heatmaps, waffles/treemaps, and wordclouds
Adds analysis layer with Welch’s t-tests, logistic regression, and leave-one-out ablation to assess policy bundle components

GitHub

Mental Health in Tech: Which Workplace Policies Work?

A workplace policy analysis using the 2014 OSMI Mental Health in Tech survey to examine which visible, trusted supports are most associated with higher treatment-seeking among tech workers. The post turns survey patterns into practical guidance for employers, managers, and teams.

Quantifies how benefits, care options, resources, anonymity, and wellness communication relate to treatment-seeking through a composite support score
Shows that awareness gaps matter: “Don’t know” responses often perform nearly as poorly as “No,”
Shows higher support correlate with higher treatment-seeking
Highlights takeaways: make core policies unmistakably visible, train managers for safety signals, measure trust, and plan support capacity where work interference is high.

GitHub