hermiod's blog

Gemini Flash 2.0 Live API fails the vibe check for me

I had a couple conversations with the new Multimodal Live API through the Google AI Studio.

I wanted to have a voice-to-voice conversation about an idea I was working through, and wanted Flash to walk me through what the potential pitfalls are.

Flash didn't give me the sophisticated kind of response that I expect from Claude or ChatGPT. I specifically asked Flash to compare and contrast different AWS instances and it wouldn't list them out (despite it being pretty obvious that it would have that knowledge).

Based on the style of the voice, it sounds like the voice is generated from text-to-speech as opposed to "natively multimodal" right now. Some of the responses were a little awkward.

I'm sure it's just early, and they'll make improvements. It's also unfair to compare the "Flash" line of models with Sonnet and 4o.

I really just want Advanced Voice Mode with GPTs! I want to be able to orchestrate some basic workflows with my voice. Coming soon, I'm sure.

Created: December 14, 2024