Introducing Vietcuna: Vietnamese First Open Source Large Language Model

Aug 10th, 2023

Today, we are extremely proud announcing our first Large Language Model for the Vietnamese market, Vietcuna. This is our first step toward AI democratization for the underrepresented.

Vietcuna based on the famous BLOOMZ LLM from BigScience. We further continued the pre-training process on large corpus of Vietnamese public data on the web. After that, the final models are fine-tuned on 200K+ instructional Q&A sets and 400K+ conversations in Vietnamese. The context length of the models are 2048.

We are releasing two variants of Vietcuna, which are 3B and 7B in parameters size. The models are uploaded under HuggingFace's format. Therefore, it can be easily integrated into the HuggingFace ecosystem (ChatUI, Text Generation Inference, bloomz.cpp, etc.)

With the debut of Vietcuna, we're taking a pioneering step in advancing Vietnamese technology and narrowing the AI representation gap. This isn't just about a new language model; it's about giving Vietnamese companies a specialized tool to innovate in ways previously uncharted. We believe that by catering to our specific linguistic and cultural nuances, Vietcuna will be a catalyst for tech-driven solutions tailored for our community and beyond. This is our initial leap towards democratizing AI for all. Let's redefine the tech narrative together, one by one.








