SUMMARY
This is AI generated summarization, which may have errors. For context, always refer to the full article.
![Google I/O 2024 key announcements: AI ‘teammate,’ queries by video, Workspace improvements](https://www.rappler.com/tachyon/2024/05/deepmind-reuters-1-scaled.jpg)
At its annual I/O developer conference held on Wednesday, May 15, Manila time, Google showcased the advancements of its AI model, Gemini.
“We are in the Gemini era,” declared Sundar Pichai, CEO of Google, highlighting the breakthrough of its Gemini AI model, touting improvements particularly in planning, organizing, and personalizing the user experience.
Currently, the Gemini 1.5 Pro model has a 1-million token long context window. According to Google, a long context window is a measure of how many tokens – the smallest block of data making up part of a word, image or video – that the model can process at once.
Later this year, Google plans to double the long context window to 2 million tokens, pushing Gemini’s capabilities toward multimodality, agency, and intelligence, the company said.
Why is the number of tokens important? CNET writes, “The more tokens a context window can accept, the more data you can input into a model. The more data you can input, the more information the AI model can use to deliver responses. The better the responses, the more valuable the experience of using an AI model.”
Here are the other top announcements from the event:
Google Workspace enhancements
Users can leverage Gemini’s capabilities to navigate their email effectively. They can ask Gemini to summarize emails, and it can process the content including attachments like PDFs to provide a concise overview to save time.
Additionally, Gemini 1.5 Pro can analyze up to an hour-long video and extract the main points.
Personalization improvements
![Electronics, Mobile Phone, Phone](https://www.rappler.com/tachyon/2024/05/gemini-2.png?fit=1024%2C730)
NotebookLM, a research and writing tool integrated with Gemini, promises personalization improvements.
With 1.5 Pro, NotebookLM can instantly generate comprehensive guides, complete with helpful summaries, study guides, FAQs, and quizzes, all from users’ materials. This specifically benefits teachers and students in their studies and education.
A prototype feature called Audio Overviews can analyze audio, and generate a text-based discussion. Users can participate in shaping the conversation, guiding its direction as needed, for what Google hopes to be an immersive and engaging experience.
Google also boasted about Gemini’s potentials and capability in reasoning, planning, memory, and the ability to think multiple steps ahead, to help with tasks like online shopping, and finding restaurants and attractions and analyzing reviews of those.
In addition, contextual Smart Reply emal suggestions are expanding to include more customization options.
DeepMind’s Universal AI agent
At the forefront of artificial intelligence research is the Google DeepMind team.
The team touted AlphaFold 3, a protein structure prediction tool, which can analyze how proteins interact with DNA and RNA strands. It’s a tool that the team said may revolutionize biological and medical research, leading to faster drug discovery and a deeper understanding of diseases.
The team is also working on a project called “Project Astra” which is the team’s vision for a universal AI agent, that is able to respond through various channels in a multimodal manner. Google defines multimodality as “giving AI the ability to process and understand different sensory modes” whether that is text, audio, video or other forms of content.
“Practically this means users are not limited to one input and one output type and can prompt a model with virtually any input to generate virtually any content type.”
Demis Hassabis, DeepMind’s CEO, stressed the need for this feature to be “proactive, teachable, and personal.” This means users can interact with it naturally, without any lag or delays.
Updates in generative media
Imagen 3, Google’s latest image generation model, took center stage, showing off photorealistic images boasting rich details, and fewer visual artifacts or distortions.
Google also showcased generative music technology Music AI Sandbox; and Veo, Google’s newest and most powerful generative video model, able to generate high-definition 1080p videos based on text, image, or video prompts, translating them into diverse visual and cinematic styles.
Google also showed its new watermarking technology, SynthID.
Deepmind described the technology:
“SynthID for text is designed to complement most widely-available AI text generation models and for deploying at scale, while SynthID for video builds upon our image and audio watermarking method to include all frames in generated videos.”
But the company also said that it’s not a “silver bullet” that will solved everything.
“SynthID isn’t a silver bullet for identifying AI generated content, but is an important building block for developing more reliable AI identification tools and can help millions of people make informed decisions about how they interact with AI-generated content.”
Asking questions via videos
Google said that users will eventually be able to ask questions through a video. Google will use its speech models, and deep visual understanding to analyze a user’s video frame-by-frame, feeding it into Gemini’s long context window, and form an appropriate response.
AI Search overviews
![Google I/O 2024 key announcements: AI ‘teammate,’ queries by video, Workspace improvements](https://img.youtube.com/vi/s4InWsd-J6g/sddefault.jpg)
Google Search rolled out AI Overviews in the US on the day of the event. Instead of traditional searching methods, the feature provides a “quick overview of a topic and links to learn more.”
Overviews lets users do more complex questions when doing a search. Google illustrated:
“For example, maybe you’re looking for a new yoga or pilates studio, and you want one that’s popular with locals, conveniently located for your commute, and also offers a discount for new members. Soon, with just one search, you’ll be able to ask something like ‘find the best yoga or pilates studios in Boston and show me details on their intro offers, and walking time from Beacon Hill.'”
AI-powered ‘teammate’
One interesting prototype is the virtual Gemini Powered Teammate. This AI teammate integrates into a team environment, possessing its own identity, workspace account, designated role, and specific objectives. It can also be customized to cater to each team’s unique needs.
Gemma-vision
Google announced the developments of Gemma, also one of its open-source AI models. The newest addition to the family is PaliGemma, the first publicly available vision-language model.
PaliGemma is optimized for tasks such as image captioning, visual Q&A, and image labeling. Meanwhile, Gemma 2, launching in June, boasts an impressive 27 billion parameters, making it significantly larger than its predecessors. – with reports from Anj Paller/Rappler.com
Add a comment
How does this make you feel?
There are no comments yet. Add your comment to start the conversation.