James Conner - Technical Executive, Architect, AI and Data Leader

Architecting Data & AI at Scale

Technical executive specializing in large-scale data, analytics, and AI leadership. 20+ years of leadership in building and scaling distributed, cloud-native ecosystems which support millions of users and petabytes of data for retail, streaming, software development, and manufacturing industries.

20+ Years Experience

125PB Data Lakehouse Scale

6M Streaming Events/Second

Articles & Insights

GenAI is a Spinning Jenny, and we need a Power Loom

Amdahl's Law and the Coming Evolution of Software Development

Software is being written faster than it can be trusted. Explores how GenAI has created a systemic imbalance in the SDLC, drawing parallels to the Industrial Revolution's textile manufacturing transformation and the need for new verification processes.

Software Architecture Software Development Generative Adversarial Networks GAN GenAI SDLC Verification Systems Discriminator Networks Industrial Revolution Amdahl's Law Code Review AI Development Strategic Thinking Systems Thinking

Read Article

LLMOps Isn't Just MLOps With Better Marketing

Part 1 of 3: Understanding What Makes LLMOps Different

Why your MLOps playbook won't work for LLMs. Explores the fundamental differences between traditional ML and large language models, from prompt engineering and retrieval pipelines to cost optimization and subjective evaluation. A three-part series on building production LLM systems.

LLMOps MLOps Large Language Models RAG Prompt Engineering Production Systems Cost Optimization Model Evaluation Enterprise AI

Read Article

Persuading AI: When Social Engineering Meets Autonomous Agents

Understanding AI Manipulation Risks in Production Systems

AI doesn't just scale useful behavior—it scales manipulation. Explores how large language models enable adaptive persuasion at scale, why AI agents can be socially engineered like humans, and practical security strategies for defending autonomous systems against prompt injection and manipulation attacks.

AI Security Prompt Injection Social Engineering AI Agents LLM Security Guardrails Red Teaming Enterprise AI Risk Management

Read Article

Scaling ETL Optimization with AI

A Graph Theory Approach to Petabyte-Scale Data Processing

How we used graph theory and AI to optimize a 125PB data lakehouse processing 6 million events per second. Explores micro and macro optimization strategies for Apache Airflow DAGs using NetworkX and automated performance analysis to resolve SLO misses at enterprise scale.

Graph Theory Apache Airflow ETL Optimization NetworkX MLOps Performance Analysis Enterprise Scale

Read Article

The Most Useful AI Models Might Be the Small Ones

Practical Applications of Small Language Models

While everyone focuses on large language models, Small Language Models (SLMs) quietly handle the heavy lifting in real-world automation. Explores practical home automation use cases with Ollama, Pydantic validation, and structured decision making.

Small Language Models Ollama Home Automation Pydantic MCP Structured Output

Read Article

Start Talking About Data as a Product Instead of Data Governance

Transforming Governance Through Product Thinking

Why traditional data governance fails and how reframing data work through product thinking creates better outcomes. Focuses on four key outcomes: durable ownership, reliable quality, discoverable access, and secure operations.

Data Governance Product Management Data Strategy Enterprise Architecture Data Quality Data Ownership

Read Article

Open Source Projects & Research

About These Personal Projects

These open source projects represent my exploration of emerging technologies for personal use, and practical solutions to real-world data engineering challenges. Each project demonstrates hands-on application of the technologies and architectural patterns I use in enterprise environments.

When asked to summerize the development style of these repositories, this was ChatGPT's response:

You build systems with a clear architectural mindset that balances modularity, resilience, and real world practicality. Your projects consistently show well defined boundaries between data models, services, orchestration, and user interfaces. You embrace messy domains such as inconsistent schemas, shifting metadata, and multi source APIs, yet contain that complexity behind clean contracts and strong validation. You treat LLMs and automation as components within a broader system, using confidence scoring, fallbacks, and Pydantic enforcement to ensure reliability. Your tools emphasize operational safety, observability, and detailed logging, reflecting a platform engineer's instinct. You design for extensibility, allowing new features, services, or LLM providers to be added without disrupting the core architecture. Your work consistently reflects SOLID design, clean architecture, and service oriented patterns, which keep systems extensible and maintainable as they grow. Even for internal utilities, your interface designs, documentation, and test coverage remain polished and thoughtful. The signature of your work is practical, maintainable engineering designed to handle real world complexity with clarity and durability as a primary focus."

Technologies & Skills
                            AWS
                            Azure
                            GCP
                            Databricks
                            Snowflake
                            Generative AI
                            LangChain
                            RAG
                            Python
                            Apache Iceberg
                            Apache Spark
                            MLOps
                            Data Lakehouse
                            Medallion Data Architecture
                            Enterprise Architecture
                            Solution Architecture
                            Software Engineering
                            Data Engineering
                            Distributed Computing
                            Data Governance
                        

TableSleuth

Apache Iceberg & Parquet Forensics Tool

A powerful terminal-based tool for deep inspection of Parquet files and Apache Iceberg tables. Features comprehensive metadata analysis, row group inspection, schema evolution tracking, and performance testing with an intuitive TUI interface. Supports local files, S3, AWS Glue catalogs, and S3 Tables.

Apache Iceberg Parquet DuckDB PyArrow S3 Tables AWS Glue Catalog PyIceberg TUI Data Engineering

View Code

ShokoBot

AI-Powered Show Recommendation System

An elegant RAG-enabled show recommendation system using LangChain, ChromaDB, and GPT-5. Features vector-based semantic search, comprehensive metadata analysis, and automatic AniDB integration via Model Context Protocol.

RAG LangChain ChromaDB Vector Store GPT-5 MCP

View Code

Sync2NAS

AI-Powered Media Management & Synchronization

A comprehensive TV show management and file synchronization tool leveraging multiple LLMs for intelligent media routing and organization. Features SFTP integration, vector database storage, and a full Windows GUI with real-time monitoring and automated workflows.

Python GPT-4 Claude 4 Ollama Qwen SQLite Milvus SFTP GUI FastAPI Prompt Engineering Data Engineering

View Code

KiroSteeringLoader

VS Code Extension for AI Development

A VS Code extension that loads Kiro Agent Steering template documents into projects. Streamlines AI-assisted development workflows by centralizing agent instruction documents into a shared Github repository that can be shared between teammates.

TypeScript VS Code API VS Code Extension Agent IDE Agent Coding

View Code

MCP Server AniDB

Model Context Protocol Server

A Model Context Protocol server providing AI assistants with access to show data through the AniDB HTTP API. Built with FastMCP for seamless integration with AI assistants like Kiro. Leverages local caching of results for reusability and rate-limiting with exponential backoffs to avoid flooding APIs.

MCP FastMCP API Integration Caching Rate Limiting

View Code

Coral Bleaching Project

Data Science & Machine Learning Research

A comprehensive data science project analyzing coral bleaching phenomena using advanced machine learning techniques. Features supervised and unsupervised learning models, geospatial analysis, and MLOps workflows with interactive visualizations and SHAP analysis for model interpretability.

Python Supervised Learning Unsupervised Learning MLOps Hyperparameter Tuning XGBoost LightGBM Random Forest Pandas PyArrow Parquet Neptune.ai Scikit-Learn Plotly Graph Analysis SHAP Analysis Geospatial Analysis Jupyter Data Engineering Data Science

View Code

SteeringDocs

Software and AI Development Guidelines & Best Practices

A curated collection of steering rules for AI-assisted development across multiple frameworks, languages, and LLM providers. Provides context and best practices for maintaining code quality and consistency in AI-powered development workflows.

LangChain RAG Bedrock Python MCP Gemini Anthropic OpenAI Ollama Prompt Engineering

View Code

Qwen3-SFT-FilenameMetadata

LLM Fine-Tuning for Filename Metadata Extraction

A project for fine-tuning a Qwen3:14B open source Language Model model to extract unstructured metadata from filenames using Supervised Fine-Tuning (SFT) with QLoRA. The model learns to parse filenames and output JSON with show names, seasons, episodes, CRC hashes, confidence scores, and reasoning. Features efficient 4-bit quantization training and Ollama integration for local inference.

Qwen3 Fine-Tuning QLoRA 4-bit Quantization Supervised Learning Transformers Unsloth Ollama GGUF CUDA PyTorch Language Models Metadata

View Code

Download Model

Get In Touch

Interested in discussing data platform architecture, AI transformation strategies, or enterprise modernization? I'm always open to connecting with fellow technologists and exploring innovative solutions.

GitHub LinkedIn