Projects

Portfolio, Sector & Event Lab (Pro)

This project shows how I rebuilt a Python Shiny finance analytics app to scale with DuckDB, S3, and query-on-demand design. The new architecture moved beyond an in-memory prototype, making the app faster, more scalable, and better suited for richer multi-year market analysis across portfolio backtests, sector exploration, and event studies.

  • Rebuilt the app architecture around DuckDB + S3 + Parquet instead of a pandas-first in-memory workflow
  • Enabled larger-history analysis and more robust event-study workflows without loading the full dataset into memory
  • Used query-on-demand design to improve app responsiveness and support more realistic data scale
  • Preserved a streamlined Shiny interface while making the backend more production-ready and scalable

Ames, Iowa Housing ML

A production-ready housing price prediction pipeline for Ames, Iowa, built around a weighted voting ensemble (Lasso, XGBoost, CatBoost) for strong accuracy and robust behavior on edge cases. Includes a Shiny for Python dashboard that simulates deal economics and renovation ROI for investor and house-flipper scenarios.

  • Weighted voting ensemble (Lasso + XGBoost + CatBoost) with test R² ≈ 0.933
  • Research-to-production workflow: EDA notebook, production pipeline notebook, deployable artifacts (.pkl)
  • Stress-tested with extreme “outlier house” scenarios to validate stability and guardrails
  • Shiny dashboard with investor and flipper modes, including renovation toggles and deal economics views

Portfolio, Sector & Event Lab (Legacy)

An end-to-end US stock market analytics pipeline and Shiny for Python app that combines Yahoo Finance price data with SEC EDGAR fundamentals and filings context. The repo builds a cached master Parquet dataset plus an optimized sample slice powering three modules: Portfolio Backtester, Sector Explorer, and Event Study.

  • Builds a master Parquet dataset from Yahoo Finance + SEC EDGAR with caching for safe reruns
  • App includes Portfolio Backtester, Sector Explorer, and Event Study tools
  • Master dataset includes returns, split history, filing context, and shares/float with backfills
  • Produces a lightweight outputs/sample.parquet slice for fast app loading (with fallback logic)

Mental Health in Tech: 2014-2023

A reproducible multi-year EDA of OSMI tech worker surveys (2014, 2016–2023) that harmonizes changing survey instruments into a consistent analysis table. Focuses on whether visible workplace support policies correlate with higher treatment-seeking, supported by clear visual storytelling and lightweight statistical modeling.

  • Harmonizes eight survey waves into a 2014-aligned schema using robust renaming and regex fallbacks
  • Builds policy “support scores” (0–4 and 0–5) capturing benefits, care options, seek-help resources, anonymity, and wellness visibility
  • Visual toolkit includes point plots with 95% CI, stacked bars, heatmaps, waffles/treemaps, and wordclouds
  • Adds analysis layer with Welch’s t-tests, logistic regression, and leave-one-out ablation to assess policy bundle components

Mental Health in Tech: Which Workplace Policies Work?

A workplace policy analysis using the 2014 OSMI Mental Health in Tech survey to examine which visible, trusted supports are most associated with higher treatment-seeking among tech workers. The post turns survey patterns into practical guidance for employers, managers, and teams.

  • Quantifies how benefits, care options, resources, anonymity, and wellness communication relate to treatment-seeking through a composite support score
  • Shows that awareness gaps matter: “Don’t know” responses often perform nearly as poorly as “No,”
  • Shows higher support correlate with higher treatment-seeking
  • Highlights takeaways: make core policies unmistakably visible, train managers for safety signals, measure trust, and plan support capacity where work interference is high.