Google Introduces PaliGemma 2 Family of Open Source AI Vision-Language Models

Google launched the successor to its PaliGemma synthetic intelligence (AI) vision-language mannequin on Thursday. Dubbed PaliGemma 2, the household of AI fashions enhance upon the capabilities of the older technology. The Mountain View-based tech big stated the vision-language mannequin can see, perceive, and work together with visible enter corresponding to photographs and different visible belongings. It’s constructed utilizing the Gemma 2 small language fashions (SLM) which had been launched in August. Apparently, the tech big claimed that the mannequin can analyse feelings within the uploaded photographs.

Google PaliGemma AI Mannequin

In a blog post, the tech big detailed the brand new PaliGemma 2 AI mannequin. Whereas Google has a number of vision-language fashions, PaliGemma was the primary such mannequin within the Gemma household. Imaginative and prescient fashions are completely different from typical giant language fashions (LLMs) in that they’ve further encoders that may analyse visible content material and convert it into acquainted information type. This manner, imaginative and prescient fashions can technically “see” and perceive the exterior world.

One good thing about a smaller imaginative and prescient mannequin is that it may be used for numerous purposes as smaller fashions are optimised for pace and accuracy. With PaliGemma 2 being open-sourced, builders can use its capabilities to construct into apps.

The PaliGemma 2 is available in three completely different parameter sizes of three billion, 10 billion, and 28 billion. It’s also out there in 224p, 448p, 896p resolutions. Attributable to this, the tech big claims that it’s simple to optimise the AI mannequin’s efficiency for a variety of duties. Google says it generates detailed, contextually related captions for photographs. It cannot solely determine objects but additionally describe actions, feelings, and general narrative of the scene.

Google highlighted that the software can be utilized for chemical components recognition, music rating recognition, spatial reasoning, and chest X-ray report technology. The corporate has additionally printed a paper within the on-line pre-print journal arXiv.

Builders and AI lovers can obtain the PaliGemma 2 mannequin and its code on Hugging Face and Kaggle here and here. The AI mannequin helps frameworks corresponding to Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

Catch the newest from the Shopper Electronics Present on Devices 360, at our CES 2025 hub.

Source link