An interview with AI expert Pieter Buteneers on Google’s Gemini, Sam Altman, and OpenAI

The Future of AI: “Google’s Gemini Ultra Slightly Overshot My Expectations”

The Future of AI: “Google’s Gemini Ultra Slightly Overshot My Expectations”

An interview with AI expert Pieter Buteneers on Google’s Gemini, Sam Altman, and OpenAI

The Future of AI: “Google’s Gemini Ultra Slightly Overshot My Expectations”


Almost one year ago, OpenAI released ChatGPT. Now a real competitor from Google called Gemini has been released in part. Bard, the first contender to ChatGPT, could not convince users, but maybe Gemini can? We talked with AI expert Pieter Buteneers about the new release, about what is (not) known about the model and the ongoing OpenAI news and what is yet to come.

devmio: Hi Pieter! Thanks for taking the time to answer our questions about Google Gemini. My first question is simple: What can it do?

Pieter Buteneers: Honestly, to me, it's just Google coming out with a competitor of GPT-4 and claiming that it can do a lot more things than GPT-4. But in practice I think there are only a few more things that Gemini can do compared to GPT-4 and that is processing not just text and images but also video and audio recordings.

That's at least what Gemini Ultra can do: process images, video and audio. That version will only be released next year, probably in January. So what has been released so far is the text processing version that is now integrated into Google Bard, which is a competitor to OpenAI’s ChatGPT or Microsoft’s Bing Chat.

Google claims that their model outperforms GPT-4 in a lot of benchmarks. But after all, they are benchmarks and not real life experience. Performing well on a benchmark doesn't necessarily mean that it really dramatically improves your user experience. But it's at least a good indication that they think they are on a good track and comparable with GPT-4.

devmio: Can Gemini Ultra process video in real time? That’s what the demo suggested.

Pieter Buteneers: We don't know, it's not publicly released. But my guess is that you will have to upload the recording to get the results. It won't be like a voicebot where you can have fluent conversations with, at least not yet.

But what I noticed when using Google Bard with Gemini underlying is that the response times are super fast, way faster than my experience with GPT-4. This means that Google has been able to dramatically optimize how these models run, how fast they do what we call inference: basically how fast they can produce a result.

At least for text, it is really fast when I tried it. Maybe I was lucky because I used it in the morning and nobody else was trying it.

Google is notoriously not so good at user experience. They have a few very good products like the search engine, Google Maps and Gmail, but if you look at most of their other products they're not so good at creating a really good customer experience.

devmio: Is something publicly known about the model size of Gemini?

Pieter Buteneers: What I could find on the internet is their technical report, which says nothing. And that brings me back to what I was talking about before: Google tested Gemini on a lot of benchmarks but they cherry picked the results especially for marketing.

But I wouldn't go out stating that Gemini is better than GPT-4 at least not on the benchmarks and in real life. You have to try it for yourself, because it can be technically more correct, but just super annoying to use.

And obviously, all the Gemini demos that were shown were super impressive. But I remember Google showing a demo, I think four years ago, about a voice bot talking about Pluto, which was also super impressive. But if you ended up using their model in real life, it was useless.

devmio: If I remember correctly, someone at Google actually stated that the videos were at least in parts staged. Of course they cherry-picked for promotional reasons but at the same time, it’s a bad look. What do you think of this?

Pieter Buteneers: Yeah, it doesn't look good. But on the other hand, that's what mostly happens in these kinds of demos and large enterprises often do this. They staged their demo to the maximum extent possible. We still need to see how these models are going to be used by actual humans. Some people are going to continue using GPT-4 because it is just what they're looking for or gives them a way more pleasant user experience.

Google is notoriously not so good at user experience. They have a few very good products like the search engine, Google Maps and Gmail, but if you look at most of their other products they're not so good at creating a really good customer experience.

Sam Altman, on the other hand, comes from Y Combinator and if Y Combinator companies are excellent at one thing, it is focus on customer experience. So given that Sam Altman is running OpenAI, my assumption is that they will be much quicker at getting a top notch user experience.

devmio: Also a lot of people already got used to ChatGPT and how to operate it. Do you think there will be differences in how to prompt these models?

Pieter Buteneers: I don't expect huge differences. We will see, but my gut feeling says if you are experienced in prompt engineering you get very similar results with both models and even with the open-source models. You can get pretty good results if you do proper prompt engineering. So I don't think that the prompt engineering, at least for now, is super model specific. Maybe there will be quirks that one model has and the other doesn't that require some refinements of your prompt but it should be small.

devmio: When ChatGPT came out a lot of people, including you, weren't expecting something like that to happen. Now with Gemini, is this what you expected from Google?

Pieter Buteneers: Gemini Pro is definitely what I expected. It is just an answer to GPT-4 for text. I have no clue what the Gemini Nano version that will run on phones will do, whether that's going to be interesting or not. It's just a small version of Gemini.

The Gemini Ultra slightly overshot my expectations. I mean, it was just a matter of time until somebody came up with a model to use audio and video like that. It was going to happen and that's what Google did. So it's not a huge surprise, but they beat OpenAI to the punch on this one. But they were late with everything else.

But what I noticed when using Google Bard with Gemini underlying is that the response times are super fast, way faster than my experience with GPT-4.

devmio: You just mentioned the Gemini Nano version. Does it run locally?

Pieter Buteneers: Yes, the idea is that it can run on the tensor chips on the Google Pixel phones, so only the phones created by Google. That is cool, but the problem is that you end up working with a very small model. We had small language models ages ago we didn't end up using because they didn't work very well.

Yes, you can run Gemini Nano on a phone and probably one of the smaller open-source models from Meta too. But that doesn't mean it will perform that well, or that it will give you answers that you're looking for.

It really depends on what cases they will use it for. That is another thing that, on average, Google is not that good at. Maybe it has changed, but they used to be very good at hiring people who are extremely good at solving complex and difficult problems. The engineers working at Google are extremely good problem-solvers for extremely challenging problems, and quite often user experience is beneath them.

devmio: Do you have an example of that?

Pieter Buteneers: For example, user experience improved a lot with the new Bard. It is much faster now. Before the switch to Gemini, you would input your prompt and wait and half a minute or so later until the answer came.

ChatGPT’s model is also very slow and it takes a long time before you get the full answer. But then they started sending you characters in real time. Now, it felt like the model was doing something and answering your question. That's how the user has a feeling of something happening, that it's talking.

It's a different culture in the company and not everybody likes it. But at least for me, it's an indication that on OpenAI's side, they work towards creating a pleasant experience. This is much less the case at Google. Maybe it's because they use another model that doesn't generate text character by character, I don't know.

devmio: What is actually known about the model Gemini? Is there a “paper” as with ChatGPT?

Pieter Buteneers: I think it's even worse than with ChatGPT. I haven't fully read the technical report yet, but what I can see is that it contains even less information than the technical report of GPT-4. It's more like a marketing paper than a technical report.

devmio: You already talked about Sam Altman. What do you think of the drama surrounding him and OpenAI in the last few weeks?

Pieter Buteneers: OpenAI was not started as a regular for profit company. It started as sort of a nonprofit/for-profit, which was aimed at figuring out AGI with the goal to open-source everything and do things for the public good. But what they have been doing more recently is selling API access to whomever wants to buy it. They have their product ChatGPT and GPT-4, which they sell for 20 bucks a month.

They have been transforming from a non-profit research organization which shares lots of knowledge and helps the world's understanding of AI and getting closer to AGI to a for-profit company that makes money off of the models that they have created. However, their board structure was still organized in the original way where they were a kind of non-profit organization.

One simply doesn't work well with the other. At some point, it was bound to happen that something had to give in in the organization. There was a lot of drama and I think some people played it very smart in the press. Sam Altman is the big winner here. He could start immediately at Microsoft and more than half of his team was ready to come with him. Therefore, the board had no other option than to step back.

It's the type of drama that typically happens in-house and you never hear about. Now it is playing out openly in the press.

devmio: Do you think it is strange that it happens out in the open? Is it to some extent for publicity?

Pieter Buteneers: If you look at how it played out, it was a very smart move to publicly talk about this leak. Things happened in the way that Sam Altman wanted it to happen.

Those kinds of mind-bending poker power plays are probably not what the engineers on the board are best at, which is to be expected. Whereas, if you have more experience with power play you probably know what you can do and wiggle your way around. Again, it was a very smart move. But is also scary to think of someone being this manipulative running this kind of technology.

Silicon Valley is doing really cool stuff like building rockets and electric cars, but also does really crazy stuff like buying Twitter and calling it X for example. I hope that we are not going down the same path with OpenAI.

devmio: I read a comment on Hacker News where someone asked, why Google didn’t open-source Gemini, so that every future architecture would basically be built on that model. That is kind of what happened with BERT, right? Why do you think Google didn't do it now?

Pieter Buteneers: That is why I think they acquired Deepmind and now have Google Brain. They were looking for an excuse to hire a lot of AI engineers and let them do crazy stuff. That whenever somebody discovers a big step ahead that they could just stop open-sourcing everything and keep the cool technology in-house because they knew that at some point in time somebody was going to have a major breakthrough in AI. Preferably that would have been themselves.

Google Deepmind did a lot of big breakthroughs, for example with Alpha Zero playing chess and Go. But also with how they used reinforcement learning to take a major step in protein folding which is used for all kinds of research like cancer research.

But I think they were just preparing for something like ChatGPT to happen so that they have all the people in-house, stuck within Google due to stock options you only get after four years. They're not bound to leave because there's a big check at the end. So they can leverage their brains and technology to the maximum extent to make sure that they don't lose their leap to the rest of the world.

devmio: What about Meta open-sourcing a lot of technology?

Pieter Buteneers: Because all the cool stuff at the moment is happening at either OpenAI or Google, the only thing that Meta can do to attract engineers is open-sourcing their models. Meta divested a lot of their money into the Metaverse and a lot less into AI. I think the best way to attract engineers and researchers is to open-source their models, so that as a researcher, if you go to Meta, you at least have the feeling you are building something for the world and not to make someone else richer.

Silicon Valley is doing really cool stuff like building rockets and electric cars, but also does really crazy stuff like buying Twitter and calling it X for example. I hope that we are not going down the same path with OpenAI.

devmio: With Gemini coming out now at the end of the year and the whole drama surrounding Sam Altman, OpenAI and Microsoft, do you think that the developments of the last few weeks are going to change the course of AI development in 2024?

Pieter Buteneers: I think Sam Altman's grip on OpenAI has massively increased and therefore, roadblocks for commercializing the technology have been removed, if there were any. The commercialization of GPT-4 and everything that comes next will run even smoother than before. Obviously, Google will try to catch up and make sure that they have APIs that everybody can use for their own tasks.

devmio: Thanks Pieter for taking the time for this interview!

Dr. Pieter Buteneers

Pieter Buteneers is a serial entrepreneur with a PhD in machine learning & AI. As founder and CTO of his latest company, Emma Legal, he works to fully automate the legal due diligence for mergers and acquisitions.


More articles on this topic


Gen AI Engineering Days 2024

Live on 29 & 30. October 2024 | 13:00 – 16:30 CEST