← Back to blog
Insight
Wednesday, December 13, 2023

Mistral AI And The Wow Factor With A Mixture of Experts

With the hype of Google's Gemini and most recently Mistral AI, I try to wade through the technical speak for non techies to understand what is MoE and why is Mistral making crazy buzz.

The AI developments get way too techie sometimes, and it can be really tough to wrap my head around yet more new languages, new concepts, mathematical theories, you tube videos that are tough to follow unless you follow ...

I will not despair and I will soldier on and find a way to understand. That's why I'm enthusiastic about the Rundown because I'm certain you feel it too. When you dip in a toe, you want to jump back out because it's making no sense !

The way I can make sense of things is to either draw it out for myself or infuse a little conflict and drama ! Occupational hazard. Let's try the latter for this.

Mistral, A French AI company, has emerged as an exciting challenger to Open AI and it's Large Language Model (LLM).

The French are challenging the Americans, and European tech is back in the game ...

It was a casual drop. Here's a link. No words. Let the code do the talking and trigger a quiet revolution. Sit back and, oui, watch.

Mistral AI's product is called Mixtral ( just to make it easier!) and appears to be more efficient in various critical benchmarks than Chat GPT 3.5. It's based apparently on the same architecture as Open AI has used.

Most interestingly, I'm seeing this phrase emerge all over the place which is "Mixture of Experts". Seems like the MoE experts, the better...

Told you. I am visual ! This is how I remember.

OK. Well typically, you give an LLM an input, and an expert model gives you an output. OK I got that. Clear.

With MoE, there are multiple expert models and each one is trained on a specific topic. When you prompt a question or problem, the model decides who it should go to, selects a group of experts to answer your question, and give you an output. So multiple networks are going on at the same time and there's a ... conductor that operates as a gateway to the experts.

"Hm. Let's see ... You. And ..... You! Go !"

In TV terms, think of a large CNN election day panel. Yes ! Those very big ones stretching across the studio, and even with one opposite for good expert measure ! Those mixture of experts get called upon by Wolf or Anderson or Dana. Each expert is better at one set of information he/she/they/them specialize in.

So in the world of AI and Mixture of Experts (MoE), each expert is trained on a set of specific tasks. And they get called upon to solve a problem and give the most accurate answer possible based on their specialization.

Here's more of an explainer

"MoE can improve model capacity, efficiency, and accuracy for deep learning models—the secret sauce that sets Mixtral apart from the rest.."

There are eight experts for Mixtral with 7 billion perimeters each hence the 8x7B I keep seeing all over the place.

At least two experts get called on with every input.

The context size is 32K tokens. This basically means more memory a model can look back on when you interact with it. A longer context length let's you the user, give more info to an LLM, giving it a bigger working memory.

I called my co.founder Thomas to ask more about all this.

He indicated that what's also cool is that it's open source. Maybe not 100% open source but it seems developers can freely run and build custom solutions over the model. Free. Easy to access. (well not for me because I would need GPUs and compute or go to runpod.io at a cost that's more accessible)

This release is being compared to Stable Diffusion , the AI image model, which helped developers all over the world develop and train new AIs, new workflows and generate really amazing stuff.

It's also multi-lingual: French, German, Spanish, Italian, and English. Not bad. No Swahili or Arabic but pretty good.

Here's more info for a deeper dive

Interested to learn more?

Sign up today to get notified on the future of communication and AI

More from the blog

Ndoto. An AI Generated Film for Climate Action in Africa.

The film was produced by Mirada Studios and TheRundown.Studio for COP28. The precision of algorithms and the creativity of humans. A good team.

African NFT Artists Push On

I have not given up on NFT art. The people still in this space are serious. They are building. I am building. The Luka Press Club ! Somehow ...

Image Prompting

A good starting point for anyone who would like to learn about prompting text to image generation. I'll keep sharing what I learn with you.

AI Art Tools I Use

The creator in you is going to have some fun here. I did. Looking at the super cool AI text to image platforms out there and more.