It's been less than a year since DALL-E 2 and ChatGPT were released and OpenAI doesn't seem to be slowing down anytime soon. After a recent announcement, it seems like Microsoft might be planning to launch GPT-4 as soon as next week – introducing an improved generation model and possibly AI-produced videos.
The revelation was made by Andreas Braun, the Chief Technology Officer at Microsoft Germany, during a recent event titled "AI in Focus - Digital Kickoff". Braun indicated that "We will introduce GPT-4 next week … we will have multimodal models that will offer completely different possibilities — for example videos."
This news is a bit surprising considering Sam Altman's remarks during an interview with StrictlyVC. Altman was asked if GPT-4 will come out in the first quarter of the year in which he responded without any certainty. "It'll come out at some point, when we are confident we can do it safely and responsibly."
He did seem to underhype the noise that's been flooding the internet by saying it would likely "leave people disappointed"
So if the rumors are true and it actually gets released sometime next week, what is it going to include?
We've seen teases pointing closer to the next upgrade of GPT when we saw tools released with GPT-3.5 and GPT3.5-turbo (like ChatGPT).
Bruan also said "You can ask a question in German and get an answer in Italian. With multimodality, Microsoft/OpenAI will "make the models comprehensive". What exactly does this mean? We'll find out very soon.
As of now, ChatGPT can only reply back with text. Are we in for a huge shock?
This wouldn't be the first expansion beyond AI text generation from a company. A few months ago Meta announced Make-A-Video which is basically DALL-E but for video generation.
I've been pretty obsessed with ElevenLabs recently. I uploaded 3 audio clips of my voice & now have a model where I can basically voiceover anything... It's insane.
What to Expect from GPT-4?
While the exact numbers aren't known, GPT-4 is predicted to have 100 trillion parameters. This is crazy if true, since it would be 500x the size of GPT-3. At the time of release, GPT-3 was the largest neural network ever created. To think as soon as Monday we might see a model come true with this multiple is utterly mind blowing.
A few possible ideas that might come to life are: video generation based on text (imagine if ChatGPT could respond with an animated video or if you could instantly make stock videos for documentaries)
We might see a model that accepts text, audio, images, and even video inputs. The productivity and generative benefits from a multimodal model would be incredible.
Now that OpenAI reduced reduced costs by 10x in the last month, accessibility and scalability is becoming a lot more feasible. It's going to be a lot more affordable for both previous and new GPT models to become integrated in tools.
So, I think it's just waiting time. While I'm hoping the rumors are true, there's nothing we can do but wait. Even if it's released next week, we'll be looking at a few months until we see the tool adopted into most modern AI tools.
If you're looking to stay up to date on the newest AI news, subscribe to an AI newsletter. As for now, enjoy the weekend & hope that on Monday or Tuesday we wake up to some amazing news!