Microsoft Germany CTO, Andreas Braun, confirmed that GPT-4 is coming within a week of March 9, 2023 and that it will be multimodal. Multimodal AI means that it will be able to operate within multiple kinds of input, like video, images and sound.
Multimodal Large Language Models
The big takeaway from the announcement is that GPT-4 is multimodal (SEJ predicted GPT-4 is multimodal in January 2023).
Modality is a reference to the input type that (in this case) a large language model deals in.
Multimodal can encompass text, speech, images and video.
GPT-3 and GPT-3.5 only operated in one modality, text.
According to the German news report, GPT-4 may be able operate in at least four modalities, images, sound (auditory), text and video.
Dr. Andreas Braun, CTO Microsoft Germany is quoted:
“We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities – for example videos…”
The reporting lacked specifics for GPT-4, so it’s unclear if what was shared about multimodality was specific to GPT-4 or just in general.
Microsoft Director Business Strategy Holger Kenn explained multimodalities but the reporting was unclear if he was referencing GPT-4 multimodality or multimodality in genera.
I believe his references to multimodality were specific to GPT-4.
The news report shared:
“Kenn explained what multimodal AI is about, which can translate text not only accordingly into images, but also into music and video.”
Another interesting fact is that Microsoft is working on “confidence metrics” in order to ground their AI with facts to make it more reliable.
GPT-4 Applications
There is no current announcement of where GPT-4 will show up. But Azure-OpenAI was specifically mentioned.
Google is struggling to catch up to Microsoft by integrating a competing technology into its own search engine. This development further exacerbates the perception that Google is falling behind and lacks leadership in consumer-facing AI.
Google already integrates AI in multiple products such as Google Lens, Google Maps and other areas that consumers interact with Google. This approach is to utilize AI as an assistive technology, to help people in little tasks.
The way Microsoft is implementing it is more visible and consequently it is capturing all the attention and reinforcing the picture of Google as flailing and struggling to catch up.