Stability Launches A Japanese AI Image-to-Text Generator

Stability has recently released their very first Japanese vision-language model that has the ability to generate textual descriptions and answers from input images.
Photo by Andy Kelly on Unsplash
September 12, 2023 9:00 am

Stability has recently released their first Japanese vision-language model that has the AI capabilities to generate textual descriptions and answer questions based on input images.

The Japanese InstructBLIP Alpha can generate Japanese text while accurately having the ability to recognize Japan-specific objects included in the input.

Based on the photo preview from Stability, the model pointed out “Sakura and Tokyo Skytree”.

Users can input a prompt (optional) asking about the image uploaded and the model will quickly answer the question according to what the image shows.

The preview photo showcased the model’s ability to answer the image-related question. The prompt asks “What color is the yukata of the person on the right?” the model answered “purple.” 

Consequently, Stability initialized a part of the model with pre-trained InstructBLIP on large English datasets in order to make a high-performance model with a limited Japanese dataset.

It also enables a conditional image to text generation built upon the Japanese large language model (LLM) named Japanese StableLM Instruct Alpha 7B – created for Japanese speakers.

According to Stability, the model is solely developed for research purposes and it is exclusively available for research use only.

The model is intended to be used by the open-source community in adherence with the research license..

Currently, the Japanese InstructBLIP Alpha is available on Hugging Face hub and users can freely use it for testing, inference and additional training. 

Stability is gradually leaning towards creating AI-powered models that could cater to more languages. It is nice to see that their newest model is a step forward in advancing AI capabilities, considering that it has the ability to determine Japan-specific objects. 

Written by Andy Hoo
Andy is an investigative tech journalist at Gold Penguin. Besides being a journalist with the heart and mind for truth and credibility, he is also a passionate content creator who loves making informative and recreational videos. He writes all types of news in the technology & AI industry.
