HOME
Deepily.ai logo

Voice Is the Primary Modality
for Agentic AI

Not a novelty layer. Not a demo feature. The actual interface.

R. P. Ruiz — Senior AI Architect at Google. Founder, Deepily.ai.

LinkedIn Medium
R. P. Ruiz

20 years of R&D. Linguist by training, hacker by nature.

The Voice-First Thesis

The Observation

I started my career as a linguist, studying how humans actually communicate as we move through the world. The answer was obvious and overwhelming: we evolved the capacity walk and talk because our lives depended on it. Written language is a brilliant technology for transmitting ideas through time, but it is often overused and ill-suited for realtime communication. When we force AI interactions through keyboards and screens connected to text consoles and chat windows, we are over-indexing on a modality that's cognitively constraining and fundamentally unnatural when used for all of our realtime communications.

There's a reason why we can easily walk and talk at the same time. Contrast this with texting and driving.

The Problem

Screens demand visual attention. Consoles demand hands that can type. You can't drive, cook, or walk the dog while staring at a terminal. Current "voice AI" — the kind that sets timers and plays music — is a toy demo, not a serious interface. The real challenge is communicating with groups of agents, synchronously and asynchronously, using voice as a salience signal: if an agent reaches out to you using speech mode, then it must be important. Voice-first communication requires rethinking everything: notification architectures, trust models, decision routing, and real-time streaming.

If your agentic processes can only be operated through a screen, you've just increased the cognitive load on the user by constraining the form their interactions can take.

The Proof

It's possible right now to manage groups of agents — both synchronously and asynchronously — using your voice and your ears for the priority communications, and text and screens for lower priority status signaling. I know because I built it. Lupin, my open-source agentic framework, allows you to drive Claude Code with the Cosa Voice MCP, as well as manage non-Claude agentic processes, using speech-to-text, intent routing, agent orchestration, bidirectional notifications, and text-to-speech streaming behind the scenes. No screen required. This isn't a vision statement — it's a working system.

It's possible RIGHT NOW to manage high-priority communication with agents using nothing but your voice and ears.

Evidence from the Field

10 years of projects that led to one conclusion.

Real-time bidirectional voice

Lupin + CoSA Voice MCP

The reference platform that proves the thesis. A full voice-first agentic pipeline — speech-to-text, intent routing across 15+ specialized agents, bidirectional notifications, and progressive TTS streaming — all operable without a screen. Built from scratch as a solo R&D effort, from WebSocket architecture to LoRA fine-tuned routing models.

Python FastAPI WebSocket MCP Claude Agent SDK LoRA TTS/STT
10x dev velocity

Bidirectional Notification Architecture

Voice-driven agents need to talk back. I designed a notification system that lets agents request human decisions mid-workflow, stream progress updates, and route trust-calibrated approvals — all through voice. This became the communication primitive that made voice-first agentic workflows practical, boosting development velocity by an order of magnitude.

WebSocket SSE Decision Proxy Trust Models
99% accuracy

Voice Intent Routing

Discerning a user's intent encoded by their voice requires fine-tuned LoRA adapters running on small open-source LLMs. I've achieved 99% intent classification accuracy across 15+ agent categories and 15+ browser management commands. The routing layer, running on an edge server, decides in milliseconds which agent should handle a spoken request, making the voice interface feel instant and reliable. I presented the Easy PEFT methodology at Google.

LoRA PEFT Mistral 8B
Voice-compatible workflows

Planning is Prompting

When your development environment is voice-driven, your project management workflows need to be too. I created a structured planning framework for Claude Code that works entirely through voice — session management, task tracking, and implementation planning, all designed for spoken interaction with AI coding agents.

Claude Code Workflow Design cosa-voice MCP
15s → 0.25s

Semantic Caching & Code as Memory

In 2024, I presented at Google how my early work on CoSA utilized an optimized architecture that improves AI agent response times by ~50-100x by teaching agents to recognize when they'd already solved a computationally analogous problem. Semantic caching underpins Lupin's solution snapshot system and leverages code as memory, providing not just faster responses, but also significantly better accuracy on GSM8K.

Performance Tuning Embeddings Vector Search GSM8K
Abstracted prediction engine

AMPE — Interface as Insight

At HelioCampus, I architected AMPE, a prediction engine that abstracted machine learning complexity behind clean interfaces — proving that the right modality layer transforms how humans interact with AI predictions. The lesson: interface design isn't a skin on top of intelligence; it is the intelligence delivery mechanism.

scikit-learn Prediction Engine Higher Ed Analytics
The thesis seed

Voice-Controlled Sports Highlights

At Comcast, I created a demo of voice-driven search for sports content — "Show me all shots on goal" This was 2015, before smart speakers were mainstream, and later became a product. This project planted the seed: voice wasn't just a way to search; for many contexts it was the only natural way. Everything since has been building on that realization.

NLP Voice Search Content Discovery

Research Domains & Technical Competencies

Where I spend my cycles — and what I use to get there.

Voice-Driven Agentic AI

Primary research focus — the interface layer between humans and agent swarms.

Voice-driven Human-in-the-Loop Agentic Processes · Bidirectional Real-Time Voice I/O · Interrupt Handling & Barge-in · Streaming ASR · Streaming TTS · Time-to-First-Audio Optimization · Voice Activity Detection · Prosody Modeling · Whisper · Distil-Whisper · Google Chirp · Google Speech-to-Text API · ElevenLabs

Agentic Architectures & Orchestration

Coordinating groups of agents across synchronous and asynchronous workflows.

Multi-Agent Orchestration · ReAct · Plan-and-Execute · Tool-Augmented LLMs · Task Decomposition · Chain-of-Thought · Tree-of-Thought · Human-in-the-Loop · Model Context Protocol · LangGraph · LangChain · OpenAI Agents SDK · Google Agents ADK · Claude SDK · Open Interpreter · AutoGPT

Preference Learning & Trust Systems

Active R&D — teaching agents when to ask and when to act.

Online Preference Learning · Trust Proxies · Reward Modeling · DPO · RLHF · Bayesian Online Learning · Gaussian Process Preference Learning · Inverse Reinforcement Learning · Active Learning · Preference Elicitation · Case-Based Reasoning

LLM Architecture, Efficiency & Serving

Making models faster, smaller, and smarter at retrieval.

RAG · Semantic Caching · Code & Text Embeddings · Semantic Similarity · Fine-Tuning · PEFT · QLoRA · LoRA · AWQ · AutoRound · Quantization · Speculative Decoding · KV Cache Optimization · Mixture of Experts · Flash Attention · Hugging Face Transformers

ML Foundations & Classical NLP

The statistical and linguistic bedrock underneath everything else.

Deep Learning · Neural Networks · NLP · Sentiment Analysis · Document Classification · TF-IDF · Classification & Clustering · Logistic & Linear Regression · RNNs · LSTMs · CNNs · SMOTE · SHAP · spaCy · NLTK · Gensim · scikit-learn · XGBoost · LightGBM · Pandas · WandB

Infrastructure & Systems

The production stack that keeps agents running.

FastAPI · Docker · CUDA · GPUs (dual RTX 4090) · PyTorch · JAX · TensorFlow · Keras · Apache Spark · Parallel & Distributed Computing · Server-Sent Events · Linux · SQL · MySQL · PostgreSQL · GitHub · Vertex AI · Google Cloud

Models

The models I’ve shipped with, fine-tuned, or benchmarked.

Claude Opus/Sonnet · Gemini · Mistral-7/8B · Phi4 8B · nomic-embed · CodeRank · Mixtral-8x7B · Phind-CodeLlama-34B-v2 · Llama 3.x · Qwen 2.5 · GPT · Whisper · Distill-Whisper · GloVe · Word2Vec

Programming Languages

Polyglot by necessity, Pythonista by choice.

Python · JavaScript · Scala · Java · R · SQL

Research Communication

Writing that bridges research and practice.

Medium October 2025

Faster, Better, Morer: How to 5–10x Your Code Generation with Claude Code

Planning-as-prompting methodology; basis for internal Google presentation

Medium April 2025

How I Got Promoted to AI Project Manager in One Short Weekend

Interleaving attention across parallel agentic processes in AI-assisted development using Claude Code

Medium December 2024

How to Give Your LLM's GSM8K Scores a HUGE Bump

An unexpected upside to code-as-memory

LinkedIn September 2023

Adventures in Agentic Behaviors, Parts 1, 2, 3 and 4

Early analysis of emergent agentic AI patterns

LinkedIn May 2023

Chastened AI Admits It Doesn't Know All The Answers

Early critical analysis of LLM epistemic limitations

Credentials & Formation

Education

MA & BA Applied Linguistics — West Virginia University

The linguistics training wasn't incidental — it was foundational. Understanding how humans produce, parse, and negotiate meaning through speech is what made me see voice as the primary modality long before it was fashionable. Every architecture decision I make is informed by how language actually works.

Languages

English — Native   |   Spanish — Native

Certifications

  • Google Cloud: Professional Cloud Architect · 2025
  • Google Cloud: Professional Machine Learning Engineer · 2024
  • Generative AI with Large Language Models · Coursera · 2023
  • Natural Language Processing Specialization · Coursera · 2023
  • Statistics with Python Specialization · Coursera · 2020–2021
  • Deep Learning Specialization · Deeplearning.ai, Coursera · 2017–2018
  • Functional Programming in Scala · ÉPFL, Coursera · 2016–2017
  • Data Science and Engineering with Spark · Berkeley, edX · 2016
  • Machine Learning · University of Washington, Coursera · 2016
  • Data Science Specialization · Johns Hopkins, Coursera · 2015–2016