ChatGPT can now see, hear, and talk to some users

Examples included creating and reading its own children's bedtime story. Deposit Photos

ChatGPT has a voice—or, rather, five voices. On Monday, OpenAI announced its buzzworthy, controversial large language model (LLM) can now verbally converse with users, as well as parse uploaded photos and images.

In video demonstrations, ChatGPT is shown offering an extemporaneous children’s bedtime story based on the guided prompt, “Tell us a story about a super-duper sunflower hedgehog named Larry.” ChatGPT then describes its hedgehog protagonist, and offers details about its home and friends. In another example, the photo of a bicycle is uploaded via ChatGPT’s smartphone app alongside the request “Help me lower my bike seat.” ChatGPT then offers a step-by-step process alongside tool recommendations via a combination of user-uploaded photos and user text inputs. The company also describes situations such as ChatGPT helping craft dinner recipes based on ingredients identified within photographs of a user’s fridge and pantry, conversing about landmarks seen in pictures, and helping with math homework—although numbers aren’t necessarily its strong suit.

According to OpenAI, the initial five audio voices are based on a new text-to-speech model that can create lifelike audio from only input text and a “few seconds” of sample speech. The current voice options were designed after collaborating with professional voice actors.

Unlike the LLM’s previous under-the-hood developments, OpenAI’s newest advancements are particularly focused on users’ direct experiences with the program as the company seeks to expand ChatGPT’s scope and utility to eventually make it a more complete virtual assistant. The audio and visual add-ons are also extremely helpful in terms of accessibility for disabled users.

“This approach has been informed directly by our work with Be My Eyes, a free mobile app for blind and low-vision people, to understand uses and limitations,” OpenAI explains in its September 25 announcement. “Users have told us they find it valuable to have general conversations about images that happen to contain people in the background, like if someone appears on TV while you’re trying to figure out your remote control settings.”

For years, popular voice AI assistants such as Siri and Alexa have offered particular abilities and services based on programmable databases of specific commands. As The New York Times notes, while updating and altering those databases often proves time-consuming, LLM alternatives can be much speedier, flexible, and nuanced. As such, companies like Amazon and Apple are investing in retooling their AI assistants to utilize LLMs of their own.

OpenAI is threading a very narrow needle to ensure its visual identification ability is as helpful as possible, while also respecting third-parties’ privacy and safety. The company first demonstrated its visual ID function earlier this year, but said it would not release any version of it to the public before a more comprehensive understanding of how it could be misused. OpenAI states its developers took “technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people” given the program’s well-documented issues involving accuracy and privacy. Additionally, the current model is only “proficient” with tasks in English—its capabilities significantly degrade with other languages, particularly those employing non-roman scripts.

OpenAI plans on rolling out ChatGPT’s new audio and visual upgrades over the next two weeks, but only for premium subscribers to its Plus and Enterprise plans. That said, the capabilities will become available to more users and developers “soon after.”

The post ChatGPT can now see, hear, and talk to some users appeared first on Popular Science.

Articles may contain affiliate links which enable us to share in the revenue of any purchases made.

from | Popular Science https://ift.tt/bX7RztP

schalke news

Header Ads Widget

This refurbished Kindle Paperwhite is just $107 and has access to tons of free books

Dolphin follows a whale all the way to the seafloor

This indie console is kind of like if the Switch 2 was actually affordable

Famous Viking treasure contains silver from over 3,000 miles away

World’s longest suspension bridge may finally link Sicily to mainland Italy

ChatGPT can now see, hear, and talk to some users

Posted by: Jeri kim

Post a Comment

0 Comments

Search This Blog

Report Abuse

Q&A: The legendary VFX artist who brought Captain Davy Jones, ‘The Abyss,’ and more to life

Bowers & Wilkins announces the all-new Px7 S3 wireless headphones

Learning new skills > crying about work again

About Me

Subscribe Us

Ad Space

Most Popular

Why would you keep paying for a Microsoft 365 subscription when a lifetime license is only $50

I took these night-vision binoculars camping and caught something wild on video

Don’t toss those DVDs yet—this app rips and converts them all for $30

Labels

Random Posts

This refurbished Kindle Paperwhite is just $107 and has access to tons of free books

Dolphin follows a whale all the way to the seafloor

This indie console is kind of like if the Switch 2 was actually affordable

Recent Posts

Amazon is blowing out 2-packs of Moultrie 4G-enabled trail cameras for just $79

This refurbished Kindle Paperwhite is just $107 and has access to tons of free books

Dolphin follows a whale all the way to the seafloor

Popular Posts

Why would you keep paying for a Microsoft 365 subscription when a lifetime license is only $50

I took these night-vision binoculars camping and caught something wild on video

Don’t toss those DVDs yet—this app rips and converts them all for $30

Menu Footer Widget

Contact form

Header Ads Widget

ChatGPT can now see, hear, and talk to some users

Posted by: Jeri kim

You may like these posts

Post a Comment

0 Comments

Search This Blog

About Me

Social Plugin

Subscribe Us

Ad Space

Most Popular

Labels

Random Posts

Recent Posts

Popular Posts

Menu Footer Widget

Contact form