Architecting Data & AI at Scale

Technical executive specializing in large-scale data, analytics, and AI leadership. 20+ years of leadership in building and scaling distributed, cloud-native ecosystems which support millions of users and petabytes of data for retail, streaming, software development, and manufacturing industries.

20+ Years Experience
125PB Data Lakehouse Scale
6M Streaming Events/Second

Articles & Insights

GenAI is a Spinning Jenny, and we need a Power Loom

Amdahl's Law and the Coming Evolution of Software Development

Software is being written faster than it can be trusted. Explores how GenAI has created a systemic imbalance in the SDLC, drawing parallels to the Industrial Revolution's textile manufacturing transformation and the need for new verification processes.

Software Architecture Software Development Generative Adversarial Networks GAN GenAI SDLC Verification Systems Discriminator Networks Industrial Revolution Amdahl's Law Code Review AI Development Strategic Thinking Systems Thinking
Read Article

LLMOps Isn't Just MLOps With Better Marketing

Part 1 of 3: Understanding What Makes LLMOps Different

Why your MLOps playbook won't work for LLMs. Explores the fundamental differences between traditional ML and large language models, from prompt engineering and retrieval pipelines to cost optimization and subjective evaluation. A three-part series on building production LLM systems.

LLMOps MLOps Large Language Models RAG Prompt Engineering Production Systems Cost Optimization Model Evaluation Enterprise AI
Read Article

Persuading AI: When Social Engineering Meets Autonomous Agents

Understanding AI Manipulation Risks in Production Systems

AI doesn't just scale useful behavior—it scales manipulation. Explores how large language models enable adaptive persuasion at scale, why AI agents can be socially engineered like humans, and practical security strategies for defending autonomous systems against prompt injection and manipulation attacks.

AI Security Prompt Injection Social Engineering AI Agents LLM Security Guardrails Red Teaming Enterprise AI Risk Management
Read Article

Scaling ETL Optimization with AI

A Graph Theory Approach to Petabyte-Scale Data Processing

How we used graph theory and AI to optimize a 125PB data lakehouse processing 6 million events per second. Explores micro and macro optimization strategies for Apache Airflow DAGs using NetworkX and automated performance analysis to resolve SLO misses at enterprise scale.

Graph Theory Apache Airflow ETL Optimization NetworkX MLOps Performance Analysis Enterprise Scale
Read Article

The Most Useful AI Models Might Be the Small Ones

Practical Applications of Small Language Models

While everyone focuses on large language models, Small Language Models (SLMs) quietly handle the heavy lifting in real-world automation. Explores practical home automation use cases with Ollama, Pydantic validation, and structured decision making.

Small Language Models Ollama Home Automation Pydantic MCP Structured Output
Read Article

Start Talking About Data as a Product Instead of Data Governance

Transforming Governance Through Product Thinking

Why traditional data governance fails and how reframing data work through product thinking creates better outcomes. Focuses on four key outcomes: durable ownership, reliable quality, discoverable access, and secure operations.

Data Governance Product Management Data Strategy Enterprise Architecture Data Quality Data Ownership
Read Article

Open Source Projects & Research

About These Personal Projects

These open source projects represent my exploration of emerging technologies for personal use, and practical solutions to real-world data engineering challenges. Each project demonstrates hands-on application of the technologies and architectural patterns I use in enterprise environments.

When asked to summerize the development style of these repositories, this was ChatGPT's response:

You build systems with a clear architectural mindset that balances modularity, resilience, and real world practicality. Your projects consistently show well defined boundaries between data models, services, orchestration, and user interfaces. You embrace messy domains such as inconsistent schemas, shifting metadata, and multi source APIs, yet contain that complexity behind clean contracts and strong validation. You treat LLMs and automation as components within a broader system, using confidence scoring, fallbacks, and Pydantic enforcement to ensure reliability. Your tools emphasize operational safety, observability, and detailed logging, reflecting a platform engineer's instinct. You design for extensibility, allowing new features, services, or LLM providers to be added without disrupting the core architecture. Your work consistently reflects SOLID design, clean architecture, and service oriented patterns, which keep systems extensible and maintainable as they grow. Even for internal utilities, your interface designs, documentation, and test coverage remain polished and thoughtful. The signature of your work is practical, maintainable engineering designed to handle real world complexity with clarity and durability as a primary focus."

Technologies & Skills

AWS Azure GCP Databricks Snowflake Generative AI LangChain RAG Python Apache Iceberg Apache Spark MLOps Data Lakehouse Medallion Data Architecture Enterprise Architecture Solution Architecture Software Engineering Data Engineering Distributed Computing Data Governance

ShokoBot

AI-Powered Show Recommendation System

An elegant RAG-enabled show recommendation system using LangChain, ChromaDB, and GPT-5. Features vector-based semantic search, comprehensive metadata analysis, and automatic AniDB integration via Model Context Protocol.

RAG LangChain ChromaDB Vector Store GPT-5 MCP

Sync2NAS

AI-Powered Media Management & Synchronization

A comprehensive TV show management and file synchronization tool leveraging multiple LLMs for intelligent media routing and organization. Features SFTP integration, vector database storage, and a full Windows GUI with real-time monitoring and automated workflows.

Python GPT-4 Claude 4 Ollama Qwen SQLite Milvus SFTP GUI FastAPI Prompt Engineering Data Engineering

KiroSteeringLoader

VS Code Extension for AI Development

A VS Code extension that loads Kiro Agent Steering template documents into projects. Streamlines AI-assisted development workflows by centralizing agent instruction documents into a shared Github repository that can be shared between teammates.

TypeScript VS Code API VS Code Extension Agent IDE Agent Coding

MCP Server AniDB

Model Context Protocol Server

A Model Context Protocol server providing AI assistants with access to show data through the AniDB HTTP API. Built with FastMCP for seamless integration with AI assistants like Kiro. Leverages local caching of results for reusability and rate-limiting with exponential backoffs to avoid flooding APIs.

MCP FastMCP API Integration Caching Rate Limiting

Coral Bleaching Project

Data Science & Machine Learning Research

A comprehensive data science project analyzing coral bleaching phenomena using advanced machine learning techniques. Features supervised and unsupervised learning models, geospatial analysis, and MLOps workflows with interactive visualizations and SHAP analysis for model interpretability.

Python Supervised Learning Unsupervised Learning MLOps Hyperparameter Tuning XGBoost LightGBM Random Forest Pandas PyArrow Parquet Neptune.ai Scikit-Learn Plotly Graph Analysis SHAP Analysis Geospatial Analysis Jupyter Data Engineering Data Science

SteeringDocs

Software and AI Development Guidelines & Best Practices

A curated collection of steering rules for AI-assisted development across multiple frameworks, languages, and LLM providers. Provides context and best practices for maintaining code quality and consistency in AI-powered development workflows.

LangChain RAG Bedrock Python MCP Gemini Anthropic OpenAI Ollama Prompt Engineering

Qwen3-SFT-FilenameMetadata

LLM Fine-Tuning for Filename Metadata Extraction

A project for fine-tuning a Qwen3:14B open source Language Model model to extract unstructured metadata from filenames using Supervised Fine-Tuning (SFT) with QLoRA. The model learns to parse filenames and output JSON with show names, seasons, episodes, CRC hashes, confidence scores, and reasoning. Features efficient 4-bit quantization training and Ollama integration for local inference.

Qwen3 Fine-Tuning QLoRA 4-bit Quantization Supervised Learning Transformers Unsloth Ollama GGUF CUDA PyTorch Language Models Metadata

Get In Touch

Interested in discussing data platform architecture, AI transformation strategies, or enterprise modernization? I'm always open to connecting with fellow technologists and exploring innovative solutions.