Hugging Face Launches New 'State-Of-The-Art' Visual Language Model – IDEFICS

Hugging Face has recently launched their new visual language model called IDEFICS. The multimodal tool has the ability to take sequences of image inputs and generate conversational text outputs.
Photo by John Schnobrich on Unsplash
August 23, 2023 10:15 am

Hugging Face has recently introduced an open-access visual language model called ‘Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS’ (IDEFICS) – like a visual ChatGPT.

The multimodal model processes sequences of arbitrary images and text inputs and generates coherent and conversational text outputs.

 It also has the ability to describe visual content, create stories from mere images, and answer questions about photos.

In a recent tweet from a Scientist at Hugging Face, he officially introduced the first open visual language model at the 80B scale.

According to Hugging Face, their goal with this model is to reproduce and provide the AI community with systems that match the capabilities of large proprietary models. 

“We are hopeful that IDEFICS will serve as a solid foundation for more open research in multimodal AI systems,” they added.

In a release from Hugging Face, they clarified that the model is solely built on publicly available data and models (LLaMA v1 and OpenCLIP) and it comes in two variants.

The two variants include the base version and the instructed version which are both available in the 9 billion and 80 billion parameter sizes. 

Moreover, IDEFICS was stated to be a reproduction of Flamingo, initially developed by Google DeepMind but has not been released publicly.

Consequently, they emphasized working on important steps in bringing transparency to their AI systems. Before its official release:

  • They only used publicly available data
  • They provided tooling to explore dataset training
  • They shared technical lessons and mistakes, and assessed the model’s harmfulness.

Hugging Face also showcased its abilities by rolling out a photo preview of how their model works.

The newest creation of Hugging Face gave an enhanced and improved visual-language tool that can potentially generate powerful conversational outputs beneficial to visual media materials. The science team behind IDEFICS definitely created another efficient tool that is openly accessible to users.

Written by Andy Hoo
Andy is an investigative tech journalist at Gold Penguin. Besides being a journalist with the heart and mind for truth and credibility, he is also a passionate content creator who loves making informative and recreational videos. He writes all types of news in the technology & AI industry.
