YouTuber uses GPT-4 Vision to do Google’s Gemini demo actually in real-time

Credit: Calvin Wankhede / Android Authority

Google released a hands-on video demonstrating the voice response capabilities of Gemini in “real-time.”
Google later admitted that the video demo didn’t actually happen in real-time with spoken prompts.
A YouTuber used GPT-4 Vision to recreate the Gemini demo and do it in real time.

After Google released its impressive Gemini hands-on demo video, it was discovered to be a little too good to be true. But now someone has recreated that demo in GPT-4 Vision, accomplishing what Gemini couldn’t do in its video.

Sponsored ad

Google’s Gemini large language model (LLM) is the company’s most powerful suite of AI models to date, and its biggest shot at OpenAI’s GPT-4 architecture. In an attempt to show off just how capable its multimodal LLM is, Google released a hands-on video of Gemini supposedly responding to voice prompts in real time. Initially, the demo was pretty impressive, but viewers eventually discovered a disclaimer that said latency was reduced and Gemini’s outputs were shortened for brevity.

YouTuber uses GPT-4 Vision to do Google’s Gemini demo actually in real-time

Related

Be the first to comment

Give Feedback About This ArticleCancel reply