I build and evaluate ML systems for high-stakes scientific and infrastructure problems. My work combines experimentation, algorithm design, and production engineering - from warehouse scheduling and query optimization to genomics pipelines and LLM infrastructure. I enjoy turning research ideas into reliable systems with measurable real-world impact.
Python, Snowflake, SQL, ML experimentation frameworks, formal verification tooling
Python, dbt, MLFlow, Docker, Snowflake, SQL
Python, PostgreSQL, Docker, AWS, scientific data tooling
Python, RAG systems, LLM infrastructure, distributed systems
Python, bioinformatics pipelines, distributed data processing, cloud APIs
Python, Java
Triple Major, Summa Cum Laude
GPA: 3.80 / 4.00
Rapid Ultra-high Enrichment of Bacterial Pathogens at Low Concentration from Blood for Species ID and AMR Prediction Using Nanopore Sequencing
Open Forum Infectious Diseases, Vol. 7 (December 2020)
Pilot Study of a Novel Whole-genome Sequencing Based Rapid Bacterial Identification Assay in Patients with Bacteremia
Open Forum Infectious Diseases, Vol. 7 (December 2020)
This session explains why SQL optimization is harder than it appears, and why simple approaches like "ask an LLM to rewrite the query" can fail when correctness and cost matter. It contrasts heuristic techniques with rigorous methods such as rules-based optimization and formal verification, then gives a practical framework for choosing the right level of rigor for each production scenario.
Many engineers have written a query that seemed to work, only to discover later that spend in AWS or Snowflake unexpectedly spiked. SQL optimization looks simple until you are in the middle of it: there are many angles to attack the problem, and the "right" answer depends on many interacting factors.
You can throw a query at an LLM, ask it to rewrite, and ship the result - but how do you know the optimized query actually does the same thing? This talk explores why query optimization is a deceptively hard problem, not just computationally but mathematically. Using production examples, we examine what "optimal" really means, what suboptimal queries cost, and why naive first solutions do not hold up under scrutiny.
We tackle the "but couldn't we just...?" questions head-on: why you cannot reliably sample a database and compare two queries, and why asking an LLM whether two queries are equivalent confuses confidence with correctness. LLMs learn language patterns, not algebraic ones, and struggle with query equivalence for similar reasons they struggle with chess-like exact reasoning.
From there, we explore more rigorous alternatives: rules-based approaches like incremental view maintenance, the BAG algebra underneath them (more accessible than it sounds), and formal verification methods that can actually prove equivalence. We close with a practical framework for selecting the right technique, because sometimes a heuristic is exactly right, and sometimes only a proof will do.
I write about research engineering, ML systems, query optimization, and talk ideas in progress. Visit the full blog for all posts. Go to blog.
A deep dive into why SQL optimization demands both practical cost modeling and formal correctness guarantees, and where LLM-generated rewrites fit in that landscape.
Languages: Python, C/C++, Java/Kotlin, SQL, Rust, JavaScript/TypeScript, Golang
ML & Data: PyTorch, NumPy, Pandas, Scikit-learn, MLFlow, dbt, Spark
Infrastructure: AWS, Azure, GCP, Docker, Kubernetes, Terraform, Kafka, Airflow, Postgres, Redis
Scientific: Bioinformatics pipelines, NGS analysis, RAG systems, LLM deployment, experimental design
Outside of work, I enjoy making music, playing chess, and helping run developer communities. I was chapter lead for Google Developer Group Brooklyn and regularly gave talks on RAG systems, retrieval strategies, and practical LLM deployment.