Gemini Live Agent Challenge · Live Agents

AI that
sees
what you see

Point your camera at anything broken. Talk naturally. FixIt Genie is a multimodal live agent that sees what you see, hears what you say, and walks you through the fix — in real time.

▶  Watch Demo ⬡  View on GitHub
FixIt Genie onboarding intro on Android
FixIt Genie onboarding how it works on Android
FixIt Genie onboarding coverage on Android
FixIt Genie idle session on Android
FixIt Genie live speaking session on Android
Built with
Gemini 2.5 Flash Native Audio
Google ADK
Cloud Run
Firestore Vector Search
CameraX
Ray-Ban Meta
Core Capabilities

Not a chatbot. A conversation.

Bidirectional audio streaming means you talk naturally, interrupt naturally, and get real-time multimodal guidance without push-to-talk friction.

👁️
Visual Understanding
Processes camera frames at 1 FPS. Identifies equipment, reads error codes, checks gauges, spots what you might miss.
🎙️
Real-Time Voice
Bidi-streaming audio. The agent hears you, asks clarifying questions, and confirms each step visually before moving on.
🛡️
Safety First
Before any physical action — breaker panel, hot coolant, battery terminals — safety warnings fire automatically.
🔍
Live Web Search
Unknown error code? It searches in real time. Fetches YouTube repair transcripts, manufacturer PDFs, service bulletins.
🧠
Domain Knowledge
ADK Skills for automotive, electrical, and appliances. Firestore vector search for semantic equipment lookup.
🕶️
Hands-Free with Ray-Ban
Switch to Ray-Ban Meta glasses with one tap. Same AI, now watching through your eyewear — no phone to hold.
Product Tour

From onboarding to live guidance.

A quick walkthrough of the product flow: clear onboarding, a focused start state, and a live transcript that stays readable while the agent listens and talks back.

FixIt Genie onboarding intro screen
01
Clear first-run onboarding
The app introduces the core interaction model immediately: camera, voice, and guided repair.
FixIt Genie onboarding how it works screen
02
See it. Say it. Fix it.
The onboarding flow explains the product interaction in plain language before the user ever starts a session.
FixIt Genie onboarding coverage screen
03
Focused domain coverage
The onboarding flow makes the initial repair categories obvious without overwhelming the user with options.
FixIt Genie idle session screen
04
Ready-to-start session view
Before a live session begins, the UI keeps the camera feed primary and the call to action unambiguous.
FixIt Genie live speaking session screen
05
Readable live guidance
During the session, the transcript, speaking state, and camera view stay visible together so the user can follow along in real time.
Architecture

One pipeline. Real-time everywhere.

Three live streams — camera frames, voice in, voice out — multiplexed over a single WebSocket to a Google ADK agent on Cloud Run.

📱
Phone Camera
CameraX · 1 FPS
768×768 JPEG base64
🎙️
Microphone
AudioRecord · 16 kHz
PCM mono · native VAD
🕶️
Ray-Ban Glasses
Meta DAT SDK v0.4
I420 → JPEG · live toggle
Android · Kotlin 2.3 · Jetpack Compose · Hilt DI
SessionViewModel
AudioStreamManager
AgentWebSocket
GlassesCameraManager
GenieAvatar · Compose Canvas
↑ PCM 16kHz · JPEG frames · JSON
wss://
↓ PCM 24kHz · transcript text
Google Cloud Run · ADK Agent · Python
Gemini Live API · gemini-2.5-flash-native-audio
Google ADK · adk web · /run_live
🔧 lookup_equipment_knowledge
⚠️ get_safety_warnings
▶️ analyze_youtube_repair_video
📄 lookup_user_manual
🔍 google_search (built-in)
📝 log_diagnostic_step
ADK SkillToolset
🚗 Automotive
⚡ Electrical
🏠 Appliances
Gemini Live API
Bidi-stream · native audio
text-embedding-004
🔍
Google Search
Real-time grounding
error codes · bulletins
▶️
YouTube
Transcript extraction
repair video analysis
🗄️
Firestore
Vector search · 1536-dim
semantic KB lookup
Demo Video

See It In Action

Watch on YouTube →
Technical Blog

Eyes, Voice, and a Wrench: What Happens When AI Can See What You See

There are 15 million skilled trade workers in the US alone. Every day they face equipment they've never seen before, error codes with no obvious cause. The standard solution — manuals, tutorials, hotlines — all have the same flaw: they can't see what you see. Gemini Live changes that constraint.

Read the full post →