Hugging Face Launches New 'State-Of-The-Art' Visual Language Model – IDEFICS

Hugging Face has recently launched their new visual language model called IDEFICS. The multimodal tool has the ability to take sequences of image inputs and generate conversational text outputs.

Andy Hoo

Updated August 23, 2023

Photo by John Schnobrich on Unsplash

Photo by John Schnobrich on Unsplash

Reading Time: 2 minutes

Hugging Face has recently introduced an open-access visual language model called ‘Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS’ (IDEFICS) – like a visual ChatGPT.

The multimodal model processes sequences of arbitrary images and text inputs and generates coherent and conversational text outputs.

 It also has the ability to describe visual content, create stories from mere images, and answer questions about photos.

In a recent tweet from a Scientist at Hugging Face, he officially introduced the first open visual language model at the 80B scale.

According to Hugging Face, their goal with this model is to reproduce and provide the AI community with systems that match the capabilities of large proprietary models. 

“We are hopeful that IDEFICS will serve as a solid foundation for more open research in multimodal AI systems,” they added.

In a release from Hugging Face, they clarified that the model is solely built on publicly available data and models (LLaMA v1 and OpenCLIP) and it comes in two variants.

The two variants include the base version and the instructed version which are both available in the 9 billion and 80 billion parameter sizes. 

Moreover, IDEFICS was stated to be a reproduction of Flamingo, initially developed by Google DeepMind but has not been released publicly.

Consequently, they emphasized working on important steps in bringing transparency to their AI systems. Before its official release:

  • They only used publicly available data
  • They provided tooling to explore dataset training
  • They shared technical lessons and mistakes, and assessed the model’s harmfulness.

Hugging Face also showcased its abilities by rolling out a photo preview of how their model works.

The newest creation of Hugging Face gave an enhanced and improved visual-language tool that can potentially generate powerful conversational outputs beneficial to visual media materials. The science team behind IDEFICS definitely created another efficient tool that is openly accessible to users.

Want to Learn Even More?

If you enjoyed this article, subscribe to our free newsletter where we share tips & tricks on how to use tech & AI to grow and optimize your business, career, and life.


Written by Andy Hoo

Andy is an investigative tech journalist at Gold Penguin. Besides being a journalist with the heart and mind for truth and credibility, he is also a passionate content creator who loves making informative and recreational videos. He writes all types of news in the technology & AI industry.

Subscribe
Notify of
guest

0 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments