Architecting Data & AI at Scale
Technical executive specializing in large-scale data, analytics, and AI leadership. 20+ years of leadership in building and scaling distributed, cloud-native ecosystems which support millions of users and petabytes of data for retail, streaming, software development, and manufacturing industries.
Articles & Insights
GenAI is a Spinning Jenny, and we need a Power Loom
Amdahl's Law and the Coming Evolution of Software Development
Software is being written faster than it can be trusted. Explores how GenAI has created a systemic imbalance in the SDLC, drawing parallels to the Industrial Revolution's textile manufacturing transformation and the need for new verification processes.
LLMOps Isn't Just MLOps With Better Marketing
Part 1 of 3: Understanding What Makes LLMOps Different
Why your MLOps playbook won't work for LLMs. Explores the fundamental differences between traditional ML and large language models, from prompt engineering and retrieval pipelines to cost optimization and subjective evaluation. A three-part series on building production LLM systems.
Persuading AI: When Social Engineering Meets Autonomous Agents
Understanding AI Manipulation Risks in Production Systems
AI doesn't just scale useful behavior—it scales manipulation. Explores how large language models enable adaptive persuasion at scale, why AI agents can be socially engineered like humans, and practical security strategies for defending autonomous systems against prompt injection and manipulation attacks.
Scaling ETL Optimization with AI
A Graph Theory Approach to Petabyte-Scale Data Processing
How we used graph theory and AI to optimize a 125PB data lakehouse processing 6 million events per second. Explores micro and macro optimization strategies for Apache Airflow DAGs using NetworkX and automated performance analysis to resolve SLO misses at enterprise scale.
The Most Useful AI Models Might Be the Small Ones
Practical Applications of Small Language Models
While everyone focuses on large language models, Small Language Models (SLMs) quietly handle the heavy lifting in real-world automation. Explores practical home automation use cases with Ollama, Pydantic validation, and structured decision making.
Start Talking About Data as a Product Instead of Data Governance
Transforming Governance Through Product Thinking
Why traditional data governance fails and how reframing data work through product thinking creates better outcomes. Focuses on four key outcomes: durable ownership, reliable quality, discoverable access, and secure operations.
Open Source Projects & Research
About These Personal Projects
These open source projects represent my exploration of emerging technologies for personal use, and practical solutions to real-world data engineering challenges. Each project demonstrates hands-on application of the technologies and architectural patterns I use in enterprise environments.
When asked to summerize the development style of these repositories, this was ChatGPT's response:
You build systems with a clear architectural mindset that balances modularity, resilience, and real world practicality. Your projects consistently show well defined boundaries between data models, services, orchestration, and user interfaces. You embrace messy domains such as inconsistent schemas, shifting metadata, and multi source APIs, yet contain that complexity behind clean contracts and strong validation. You treat LLMs and automation as components within a broader system, using confidence scoring, fallbacks, and Pydantic enforcement to ensure reliability. Your tools emphasize operational safety, observability, and detailed logging, reflecting a platform engineer's instinct. You design for extensibility, allowing new features, services, or LLM providers to be added without disrupting the core architecture. Your work consistently reflects SOLID design, clean architecture, and service oriented patterns, which keep systems extensible and maintainable as they grow. Even for internal utilities, your interface designs, documentation, and test coverage remain polished and thoughtful. The signature of your work is practical, maintainable engineering designed to handle real world complexity with clarity and durability as a primary focus."
Technologies & Skills
TableSleuth
Apache Iceberg & Parquet Forensics Tool
A powerful terminal-based tool for deep inspection of Parquet files and Apache Iceberg tables. Features comprehensive metadata analysis, row group inspection, schema evolution tracking, and performance testing with an intuitive TUI interface. Supports local files, S3, AWS Glue catalogs, and S3 Tables.
ShokoBot
AI-Powered Show Recommendation System
An elegant RAG-enabled show recommendation system using LangChain, ChromaDB, and GPT-5. Features vector-based semantic search, comprehensive metadata analysis, and automatic AniDB integration via Model Context Protocol.
Sync2NAS
AI-Powered Media Management & Synchronization
A comprehensive TV show management and file synchronization tool leveraging multiple LLMs for intelligent media routing and organization. Features SFTP integration, vector database storage, and a full Windows GUI with real-time monitoring and automated workflows.
KiroSteeringLoader
VS Code Extension for AI Development
A VS Code extension that loads Kiro Agent Steering template documents into projects. Streamlines AI-assisted development workflows by centralizing agent instruction documents into a shared Github repository that can be shared between teammates.
MCP Server AniDB
Model Context Protocol Server
A Model Context Protocol server providing AI assistants with access to show data through the AniDB HTTP API. Built with FastMCP for seamless integration with AI assistants like Kiro. Leverages local caching of results for reusability and rate-limiting with exponential backoffs to avoid flooding APIs.
Coral Bleaching Project
Data Science & Machine Learning Research
A comprehensive data science project analyzing coral bleaching phenomena using advanced machine learning techniques. Features supervised and unsupervised learning models, geospatial analysis, and MLOps workflows with interactive visualizations and SHAP analysis for model interpretability.
SteeringDocs
Software and AI Development Guidelines & Best Practices
A curated collection of steering rules for AI-assisted development across multiple frameworks, languages, and LLM providers. Provides context and best practices for maintaining code quality and consistency in AI-powered development workflows.
Qwen3-SFT-FilenameMetadata
LLM Fine-Tuning for Filename Metadata Extraction
A project for fine-tuning a Qwen3:14B open source Language Model model to extract unstructured metadata from filenames using Supervised Fine-Tuning (SFT) with QLoRA. The model learns to parse filenames and output JSON with show names, seasons, episodes, CRC hashes, confidence scores, and reasoning. Features efficient 4-bit quantization training and Ollama integration for local inference.