Can ChatGPT see our images and hear our voice? Yes, and we will soon find out

The new possibilities of ChatGPT are truly insane.

Can ChatGPT see our images and hear our voice? Yes, and we will soon find out
Chema Carvajal Sarabia

Chema Carvajal Sarabia

For those who use ChatGPT daily, the upcoming revolution in the tool sounds like a blessing. For those who fear artificial intelligence, this is another step towards AI supremacy.


OpenAI, the creators of ChatGPT, has announced that they were beginning to implement voice and image recognition into ChatGPT. In short, the AI can recognize an image for what it is and communicate with users about it.

Furthermore, the AI now has text-to-speech and speech-to-text synthesis capabilities. All of these new features are supposed to make the chatbot appear more “human” than in previous versions.

How does the new ChatGPT work?

OpenAI has shared a promotional video to give users an idea of how the image recognition features will work.

In it, a user asks ChatGPT for help in lowering the saddle on their bicycle, to which the chatbot responds with some general (and, if we weren’t charitable, extremely obvious) tips for lowering any type of saddle.

Next, the user drew a circle around the saddle clamp on the bike and asked for more detailed assistance, for which ChatGPT supposedly recognized the type of screw and instructed the user that they needed an Allen wrench.

Supposedly, the system is also capable of looking at a photo of the user manual and the toolbox to check if it has the correctly sized wrench.

Although image recognition isn’t something that many chatbot services have experimented with, we are well-versed in voice recognition systems as well as voice synthesis.

OpenAI introduced the chatbot’s new voice services with a video of a mother asking ChatGPT to read her children a bedtime story about a forest hedgehog (it could read them an illustrated book, but I suppose it’s a way to be a parent).

The samples included in OpenAI’s blog post have a natural sound cadence, although it’s not as if the “Juniper,” “Sky,” or “Breeze” voice packages will create unique voices for the little hedgehog Larry or any of his forest friends. Each voice is based on a voice actor who licensed their sounds to the system, according to OpenAI.

Of course, the new feature is only available to users who pay for the Plus or Enterprise service, and both capabilities should be available on iOS and Android in the next two weeks.

Users of the web version of ChatGPT will also soon have access to the image features. The system may not be as fast or capable as the promotional videos suggest.

Wired reported, based on a preliminary version, that voice recognition took several seconds to respond, and the image system won’t attempt to identify people in photos (raising concerns about data protection and people’s privacy).

Chema Carvajal Sarabia

Chema Carvajal Sarabia

Journalist specialized in technology, entertainment and video games. Writing about what I'm passionate about (gadgets, games and movies) allows me to stay sane and wake up with a smile on my face when the alarm clock goes off. PS: this is not true 100% of the time.

Latest from Chema Carvajal Sarabia