I started my career as a linguist, studying how humans actually communicate as we move through the world. The answer was obvious and overwhelming: we evolved the capacity walk and talk because our lives depended on it. Written language is a brilliant technology for transmitting ideas through time, but it is often overused and ill-suited for realtime communication. When we force AI interactions through keyboards and screens connected to text consoles and chat windows, we are over-indexing on a modality that's cognitively constraining and fundamentally unnatural when used for all of our realtime communications.
Screens demand visual attention. Consoles demand hands that can type. You can't drive, cook, or walk the dog while staring at a terminal. Current "voice AI" — the kind that sets timers and plays music — is a toy demo, not a serious interface. The real challenge is communicating with groups of agents, synchronously and asynchronously, using voice as a salience signal: if an agent reaches out to you using speech mode, then it must be important. Voice-first communication requires rethinking everything: notification architectures, trust models, decision routing, and real-time streaming.
It's possible right now to manage groups of agents — both synchronously and asynchronously — using your voice and your ears for the priority communications, and text and screens for lower priority status signaling. I know because I built it. Lupin, my open-source agentic framework, allows you to drive Claude Code with the Cosa Voice MCP, as well as manage non-Claude agentic processes, using speech-to-text, intent routing, agent orchestration, bidirectional notifications, and text-to-speech streaming behind the scenes. No screen required. This isn't a vision statement — it's a working system.
10 years of projects that led to one conclusion.
The reference platform that proves the thesis. A full voice-first agentic pipeline — speech-to-text, intent routing across 15+ specialized agents, bidirectional notifications, and progressive TTS streaming — all operable without a screen. Built from scratch as a solo R&D effort, from WebSocket architecture to LoRA fine-tuned routing models.
Voice-driven agents need to talk back. I designed a notification system that lets agents request human decisions mid-workflow, stream progress updates, and route trust-calibrated approvals — all through voice. This became the communication primitive that made voice-first agentic workflows practical, boosting development velocity by an order of magnitude.
Discerning a user's intent encoded by their voice requires fine-tuned LoRA adapters running on small open-source LLMs. I've achieved 99% intent classification accuracy across 15+ agent categories and 15+ browser management commands. The routing layer, running on an edge server, decides in milliseconds which agent should handle a spoken request, making the voice interface feel instant and reliable. I presented the Easy PEFT methodology at Google.
When your development environment is voice-driven, your project management workflows need to be too. I created a structured planning framework for Claude Code that works entirely through voice — session management, task tracking, and implementation planning, all designed for spoken interaction with AI coding agents.
In 2024, I presented at Google how my early work on CoSA utilized an optimized architecture that improves AI agent response times by ~50-100x by teaching agents to recognize when they'd already solved a computationally analogous problem. Semantic caching underpins Lupin's solution snapshot system and leverages code as memory, providing not just faster responses, but also significantly better accuracy on GSM8K.
At HelioCampus, I architected AMPE, a prediction engine that abstracted machine learning complexity behind clean interfaces — proving that the right modality layer transforms how humans interact with AI predictions. The lesson: interface design isn't a skin on top of intelligence; it is the intelligence delivery mechanism.
At Comcast, I created a demo of voice-driven search for sports content — "Show me all shots on goal" This was 2015, before smart speakers were mainstream, and later became a product. This project planted the seed: voice wasn't just a way to search; for many contexts it was the only natural way. Everything since has been building on that realization.
Where I spend my cycles — and what I use to get there.
Primary research focus — the interface layer between humans and agent swarms.
Voice-driven Human-in-the-Loop Agentic Processes · Bidirectional Real-Time Voice I/O · Interrupt Handling & Barge-in · Streaming ASR · Streaming TTS · Time-to-First-Audio Optimization · Voice Activity Detection · Prosody Modeling · Whisper · Distil-Whisper · Google Chirp · Google Speech-to-Text API · ElevenLabs
Coordinating groups of agents across synchronous and asynchronous workflows.
Multi-Agent Orchestration · ReAct · Plan-and-Execute · Tool-Augmented LLMs · Task Decomposition · Chain-of-Thought · Tree-of-Thought · Human-in-the-Loop · Model Context Protocol · LangGraph · LangChain · OpenAI Agents SDK · Google Agents ADK · Claude SDK · Open Interpreter · AutoGPT
Active R&D — teaching agents when to ask and when to act.
Online Preference Learning · Trust Proxies · Reward Modeling · DPO · RLHF · Bayesian Online Learning · Gaussian Process Preference Learning · Inverse Reinforcement Learning · Active Learning · Preference Elicitation · Case-Based Reasoning
Making models faster, smaller, and smarter at retrieval.
RAG · Semantic Caching · Code & Text Embeddings · Semantic Similarity · Fine-Tuning · PEFT · QLoRA · LoRA · AWQ · AutoRound · Quantization · Speculative Decoding · KV Cache Optimization · Mixture of Experts · Flash Attention · Hugging Face Transformers
The statistical and linguistic bedrock underneath everything else.
Deep Learning · Neural Networks · NLP · Sentiment Analysis · Document Classification · TF-IDF · Classification & Clustering · Logistic & Linear Regression · RNNs · LSTMs · CNNs · SMOTE · SHAP · spaCy · NLTK · Gensim · scikit-learn · XGBoost · LightGBM · Pandas · WandB
The production stack that keeps agents running.
FastAPI · Docker · CUDA · GPUs (dual RTX 4090) · PyTorch · JAX · TensorFlow · Keras · Apache Spark · Parallel & Distributed Computing · Server-Sent Events · Linux · SQL · MySQL · PostgreSQL · GitHub · Vertex AI · Google Cloud
The models I’ve shipped with, fine-tuned, or benchmarked.
Claude Opus/Sonnet · Gemini · Mistral-7/8B · Phi4 8B · nomic-embed · CodeRank · Mixtral-8x7B · Phind-CodeLlama-34B-v2 · Llama 3.x · Qwen 2.5 · GPT · Whisper · Distill-Whisper · GloVe · Word2Vec
Polyglot by necessity, Pythonista by choice.
Python · JavaScript · Scala · Java · R · SQL
Writing that bridges research and practice.
Planning-as-prompting methodology; basis for internal Google presentation
Interleaving attention across parallel agentic processes in AI-assisted development using Claude Code
An unexpected upside to code-as-memory
Early analysis of emergent agentic AI patterns
Early critical analysis of LLM epistemic limitations
MA & BA Applied Linguistics — West Virginia University
The linguistics training wasn't incidental — it was foundational. Understanding how humans produce, parse, and negotiate meaning through speech is what made me see voice as the primary modality long before it was fashionable. Every architecture decision I make is informed by how language actually works.
English — Native | Spanish — Native