OpenAI has introduced substantial upgrades to ChatGPT, allowing it to comprehend spoken words, respond using synthetic voices, and process images. Users can now engage in voice conversations on the mobile app, choosing from five synthetic voices, and sharing images for analysis. The update aims to compete with other AI chatbot leaders like Google and Microsoft. Concerns about synthetic voices facilitating deepfakes were addressed, with OpenAI clarifying their voices were created with professional actors. The use of consumer voice data and data security details remains somewhat vague.
OpenAI has unveiled significant enhancements to ChatGPT, marking the most substantial update since the introduction of GPT-4. This upgraded version now possesses the ability to “see,” “hear,” and “speak” in a sense, as it can comprehend spoken words, provide responses through synthetic voices, and process images. The company announced this development on Monday.
For users of ChatGPT’s mobile app, there is now the option to engage in voice conversations, with five different synthetic voices available for users to choose from for the bot’s responses. Furthermore, users can now share images with ChatGPT and even pinpoint specific areas for detailed analysis or inquiry, such as asking, “What types of clouds are these?”
OpenAI plans to roll out these changes to its paying users over the next two weeks. While the voice functionality will initially be accessible only on iOS and Android apps, the image processing capabilities will be available on all platforms.
This significant feature update comes amidst the escalating competition in the artificial intelligence landscape, with companies like OpenAI, Microsoft, Google, and Anthropic vying to introduce not only new chatbot applications but also innovative features. For instance, Google has announced multiple updates to its Bard chatbot, while Microsoft has integrated visual search into Bing.
Earlier this year, Microsoft’s substantial investment of an additional $10 billion in OpenAI was regarded as the most substantial AI investment of the year. In April, OpenAI reportedly concluded a $300 million share sale, valuing the company at approximately $27 billion to $29 billion, with contributions from prominent firms like Sequoia Capital and Andreessen Horowitz.
However, there are concerns among experts regarding AI-generated synthetic voices, which, in this context, could provide users with a more natural experience but also potentially facilitate more convincing deepfakes. Threat actors and researchers have already begun exploring how deepfakes could be exploited to breach cybersecurity systems.
OpenAI acknowledged these concerns in its announcement, emphasizing that the synthetic voices were crafted with the direct involvement of voice actors rather than being collected from anonymous sources.
The release did not provide extensive details about how OpenAI intends to utilize consumer voice inputs or how data security will be ensured if such data is employed. OpenAI’s terms of service assert that consumers maintain ownership of their inputs “to the extent permitted by applicable law.”
Regarding voice interactions, OpenAI pointed to its guidance, which states that audio clips are not retained and are not used to enhance the models. However, it should be noted that transcriptions are considered inputs and may be employed to improve the large-language models.