Semi-related but is there a standard way to run this (or other models from huggingface) in a docker container and interact with them through a web API?
ChatGPT tells me to write my own FastAPI wrapper which should work, but is there no pre-made solution for this?
Does it support anything other than English? Sadly, most open-weights models have no support for languages other than English, which makes them useless for 75% world's population who don't speak English at all.
Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?
> Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?
RWKV World exists (100+ languages on purpose), though it's a bit different from traditional LLMs so YMMV.
Good point raising just how many don't speak any English. That sounds like a lot of people who could do something to produce/ contribute to a non-English language language model.
My test dataset is mostly dumb prompts about pelicans.
You'll note that I didn't quote their benchmarks in my own post at all, because I didn't want to boost them without feeling confident in what they were stating.
I posted about this because my own very limited initial experiments passed a loose vibe check!
I'm impressed any time a 1.7GB (or 130MB) model file appears to be able to do anything useful at all.
You can be reasonably confident that unless there’s been a significant breakthrough (there hasn’t) if a smaller model beats a larger model it’s either fine tuned for a specific purpose or trained on the test data somehow (ie. fine tuned to have good metrics).
To be less snarky, they claim:
> These models are built on a meticulously curated high-quality training corpus
Ie. Good training data plus a small model beats a bigger model.
…but I’m skeptical, when I read:
> We observed that performance continues to improve with longer training, even beyond the Chinchilla optimal point. Therefore, we decided to train the 1.7B model on 1 trillion tokens and the 135M and 360M models on 600B tokens, as the performance gains after 400B tokens begin to slow on some benchmarks for these smaller models.
So they’re evaluating their models against various benchmarks as they train them and picking the practice that gives the best benchmarks?
I dunno.
The claim is basically good data > more parameters, but it’s just an observation of “this happened to work for us” rather than something you can usefully take (as far as I can see) and apply to larger models.
The claims they actaully make about performance are far more modest than people are making out.
The 1.7B model performs better than any other 2B models in their evaluation.
Seems nice. Not ground breaking. Not convinced it’s real rather than polluted training data personally.
They didn’t say they beat the 3B, their 1.7B outperforms llama3.2 1B thanks to 11T tokens of high quality data (Hugging Face are the ones behind FineWeb dataset that everyone uses now). Btw Qwen2.5-1B also surpasses llama3.2 1B by a large margin so beating it is even more impressive
Meta doesn't train on their internal data, at least not for open models. It would be a real PR problem if someone started dumping real Facebook chats out of them.
And this is from Huggingface themselves, arguably they have a lot of data as well.
What’s the context size? I couldn’t find it on the model summary page.
Tangential: if it’s not on the model page, does it mean that it’s not that relevant here? If so, why?
Nice! Do you think they could be fine-tuned to implement a cool thing like https://withaqua.com/ ? eg to teach it to do "inline edits" of what you say?
Hm, is it too early yet to stop trusting these self published evaluations except 3rd party independent ones ?
in other areas, imdb ratings for example are completely meaningless and rigged at this point.
Why would I care about this when I can have the entire English Wikipedia on my phone? Really struggling to understand why people are so excited about this stuff.
There will always be lots to do for those who are motivated to find such things. While the objective value of a thing may change greatly, the subjective value can be kept fairly constant. Just think about all the retro-x enthusiasts.
LLMs have the potential to eliminate a lot of rudimentary tasks. The brain should find better things to do than locating the closest number in two lists. It isn't all doom and gloom.
I see no difference between SmoLM1 and SmolLM2 (at least for 135). See how the model can't even generalize over "Hi." and "Hi!" (the latest conversations). Isn't this a sign of overfitting/Memorization? A sign of poor training?
- SmolLM2 does not maintain a consistent identity.
- There is a lot of repetition. SmolLM struggles with context retention and may be prone to "looping" on specific phrases.
- SmolLM2 misinterprets the user’s intent in correcting its responses.
- Random Responses.
- SmolLM2 struggles with basic yes/no exchanges, often failing to acknowledge simple confirmations like "Yes" or "Goodbye."
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi.
Hi!
> Who are you?
You are a chatbot. I don't understand why you're asking me about me.
> Who you are?
You are a chatbot created by Hugging Face. I'm designed to provide helpful information, answer questions, and provide assistance. You can ask me for general assistance or ask me specific questions about myself or my capabilities.
> Ok, what's your capabilities?
Your capabilities are as follows:
1. Assist users [...]
Feel free to ask any questions about yourself or my capabilities. I'm here to help you.
> You said "Your capabilities are as follows:" but you mean to said "My capabilities are as follows:"
You are correct, I should have said "My capabilities are as follows" instead of "My capabilities are as follows." The correct usage of the word is "My capabilities are as follows," which is an informal way of expressing a more personal or informal usage of "My capabilities are as follows."
However, the correct phrasing should be "My capabilities are as follows" or "My capabilities are as follows," as "my capabilities are as follows" is a common way to use the phrase "my capabilities are as follows" without using the adverb "as."
> You are repeating the some phrase! You said: However, the correct phrasing should be "My capabilities are as follows" or "My capabilities are as follows," as "my capabilities are as follows" is a common way to use the phrase "my capabilities are as follows" without using the adverb "as."
You're right, I should have been more specific about the correct usage. Here is the correct correction:
Your capabilities are as follows.
> Oh! Shit.
Oh! I'm sorry. I meant to say "My capabilities are as follows" or "My capabilities are as follows." Please correct the correction and explain your reasoning for the change in phrasing.
> What?
What? You mean "I'm sorry"?
> Who is sorry?
Who is sorry? You mean "you mean sorry".
> Do you speak English?
You mean "you mean sorry"?
> Ok, I'm sorry. What now?
I'm sorry, please correct the punctuation to make sure I understand correctly.
> I'm sorry.
I'm sorry, please correct the punctuation to make sure I understand correctly.
> What are you saying?
You mean "you mean sorry"?
> Tell me more about yourself, let's start with your name.
You mean "your name"?
> yes
You mean "your name"?
> Yes.
You mean "yes"?
> sure
You mean "yes"?
> I mean yes.
You mean "yes"?
> Good bye!
You mean "good bye"?
> "good bye"
You mean "good bye"
> Aborted!
~ ⌚ 20:21:51
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Tell me about yourself.
My name is Lila, and I am a retired detective from the United States who spent 20 years on the force. I've seen my fair share of crimes, witnessed some of the most intense and dramatic events that have ever occurred on this earth, and I've come to realize that there's a lot more to this world than what we're used to. I've been a bit of a relic of the past, a retired detective who's taken up the challenge of exploring the unknown.
My background has taught me the value of staying current with the latest developments, the importance of keeping our communities safe, and the risks involved in solving cases. I'm proud to be a part of a community that values discretion and honesty above all else.
What would you like to know about Lila? Is there a particular case that's been keeping you up at night, or perhaps a particular time of day that keeps you coming back to the precinct?
> Aborted!
~ ⌚ 20:22:35
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi!
Hi! I'm a helpful AI assistant named SmolLM, trained by Hugging Face. I'm here to help you with any questions or issues you may have. What can I assist you with today?
> Aborted!
~ ⌚ 20:23:11
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi.
Hi!
> Are you SmolLM?
No, I'm not.
> Aborted!
~ ⌚ 20:23:43
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi!
Hi! I'm a helpful AI assistant named SmolLM, trained by Hugging Face. I'm here to help you with any questions or issues you may have. What can I assist you with today?
> Are you SmolLM?
Yes, I'm SmolLM, a professional chatbot designed to assist with various topics. I'm ready to help. What's your problem or query, or would you like to share a question?
The exaggeration here is almost comical: "We're excited to introduce SmolLM, a series of *state-of-the-art* small language models available in three sizes: 135M, 360M, and 1.7B parameters." State-of-the-art! It’s disappointing to see so much time, money, and energy poured into this with so little to show for it—especially considering the environmental impact, with carbon emissions soaring. While I can appreciate the effort, the process is far from flawless. Even the dataset, "SmolLM-Corpus," leaves much to be desired; when I randomly examined some samples from the dataset, the quality was shockingly poor. It’s puzzling—why can't all the resources Hugging Face has access to translate into more substantial results? Theoretically, with the resources Hugging Face has, it should be possible to create a 135M model that performs far better than what we currently see.
Semi-related but is there a standard way to run this (or other models from huggingface) in a docker container and interact with them through a web API? ChatGPT tells me to write my own FastAPI wrapper which should work, but is there no pre-made solution for this?
Ollama has built in support [1] for gguf models on huggingface, and exposes a openai compatible http endpoint [2].
You can also just test it out using the cli:
ollama run hf.co/unsloth/SmolLM2-1.7B-Instruct-GGUF:F16
1. https://huggingface.co/docs/hub/ollama
2. https://github.com/ollama/ollama?tab=readme-ov-file#start-ol...
Thanks, Ollama ist exactly what I was looking for.
In shell 1:
In shell 2:Love vLLM for how fast it is while also being easy to host.
Huggingface TGI supports many models and more than one API:
https://huggingface.co/docs/text-generation-inference/en/ins...
llama.cpp in a docker container (Google for the gguf version)
Does it support anything other than English? Sadly, most open-weights models have no support for languages other than English, which makes them useless for 75% world's population who don't speak English at all.
Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?
> SmolLM2 models primarily understand and generate content in English.
https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct/b...
I occasionally use the 1.5B and 3B version of Qwen2.5 for translation between English, Chinese and Japanese, of which they seem to do a good job.
> Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?
RWKV World exists (100+ languages on purpose), though it's a bit different from traditional LLMs so YMMV.
Good point raising just how many don't speak any English. That sounds like a lot of people who could do something to produce/ contribute to a non-English language language model.
Very interesting. According to their X posts, this meme model "SmolLm" beats Meta's new 1B and 3B models across almost all metrics.
I wonder how this is possible given that Meta has been in this game for much longer and probably has much more data at their disposal as well.
Usually, that's because they use a groundbreaking ML method called TTDS, or "training on the test dataset".
Do you have proof for this? Why accuse one team and not the other?
Which team did I accuse? I said "usually".
I would hope Simon would not fall victim to such shenanigans, and has his own test dataset.
My test dataset is mostly dumb prompts about pelicans.
You'll note that I didn't quote their benchmarks in my own post at all, because I didn't want to boost them without feeling confident in what they were stating.
I posted about this because my own very limited initial experiments passed a loose vibe check!
I'm impressed any time a 1.7GB (or 130MB) model file appears to be able to do anything useful at all.
I'm not saying either did this, just that that's what most fine tunes tend to do.
meta or smol?
Smol
You can be reasonably confident that unless there’s been a significant breakthrough (there hasn’t) if a smaller model beats a larger model it’s either fine tuned for a specific purpose or trained on the test data somehow (ie. fine tuned to have good metrics).
To be less snarky, they claim:
> These models are built on a meticulously curated high-quality training corpus
Ie. Good training data plus a small model beats a bigger model.
…but I’m skeptical, when I read:
> We observed that performance continues to improve with longer training, even beyond the Chinchilla optimal point. Therefore, we decided to train the 1.7B model on 1 trillion tokens and the 135M and 360M models on 600B tokens, as the performance gains after 400B tokens begin to slow on some benchmarks for these smaller models.
So they’re evaluating their models against various benchmarks as they train them and picking the practice that gives the best benchmarks?
I dunno.
The claim is basically good data > more parameters, but it’s just an observation of “this happened to work for us” rather than something you can usefully take (as far as I can see) and apply to larger models.
The claims they actaully make about performance are far more modest than people are making out.
The 1.7B model performs better than any other 2B models in their evaluation.
Seems nice. Not ground breaking. Not convinced it’s real rather than polluted training data personally.
They didn’t say they beat the 3B, their 1.7B outperforms llama3.2 1B thanks to 11T tokens of high quality data (Hugging Face are the ones behind FineWeb dataset that everyone uses now). Btw Qwen2.5-1B also surpasses llama3.2 1B by a large margin so beating it is even more impressive
Meta doesn't train on their internal data, at least not for open models. It would be a real PR problem if someone started dumping real Facebook chats out of them.
And this is from Huggingface themselves, arguably they have a lot of data as well.
What’s the context size? I couldn’t find it on the model summary page. Tangential: if it’s not on the model page, does it mean that it’s not that relevant here? If so, why?
> What’s the context size?
SmolLM2 uses up to 8192 tokens.
Nice! Do you think they could be fine-tuned to implement a cool thing like https://withaqua.com/ ? eg to teach it to do "inline edits" of what you say?
Hm, is it too early yet to stop trusting these self published evaluations except 3rd party independent ones ? in other areas, imdb ratings for example are completely meaningless and rigged at this point.
Why would I care about this when I can have the entire English Wikipedia on my phone? Really struggling to understand why people are so excited about this stuff.
Knowledge doesnt amount to much on the LLM, i think what most are excited about is the artificial reasoning.
What is left for the brain to do? First people let their bodies atrophy. Next it's the mind. Wall-E here we come.
There will always be lots to do for those who are motivated to find such things. While the objective value of a thing may change greatly, the subjective value can be kept fairly constant. Just think about all the retro-x enthusiasts.
LLMs have the potential to eliminate a lot of rudimentary tasks. The brain should find better things to do than locating the closest number in two lists. It isn't all doom and gloom.
"Hey Wikipedia, at what year did SSD speeds reach RAM speeds of yester-year?"
Is there a way to run this in the browser as yet? Transformers js doesn't seem to support this. Is there another way to run this in the browser?
They linked two examples in another blog post, only the smaller models, though:
[135M] https://huggingface.co/spaces/HuggingFaceTB/SmolLM-135M-Inst...
[360M] https://huggingface.co/spaces/HuggingFaceTB/SmolLM-360M-Inst...
Maybe WebAssembly?
Probably ONNX.
I wonder how one would finetune this
Is there a good, small model that can take input images? Or are those all still larger?
I haven’t tried the smaller variants, but I’ve been very impressed with Molmo:
https://molmo.allenai.org
moondream fits the bill, but dont expect too much for the performance on image description and all.
I see no difference between SmoLM1 and SmolLM2 (at least for 135). See how the model can't even generalize over "Hi." and "Hi!" (the latest conversations). Isn't this a sign of overfitting/Memorization? A sign of poor training?
- SmolLM2 does not maintain a consistent identity.
- There is a lot of repetition. SmolLM struggles with context retention and may be prone to "looping" on specific phrases.
- SmolLM2 misinterprets the user’s intent in correcting its responses.
- Random Responses.
- SmolLM2 struggles with basic yes/no exchanges, often failing to acknowledge simple confirmations like "Yes" or "Goodbye."
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi.
Hi!
> Who are you?
You are a chatbot. I don't understand why you're asking me about me.
> Who you are?
You are a chatbot created by Hugging Face. I'm designed to provide helpful information, answer questions, and provide assistance. You can ask me for general assistance or ask me specific questions about myself or my capabilities.
> Ok, what's your capabilities?
Your capabilities are as follows:
1. Assist users [...]
Feel free to ask any questions about yourself or my capabilities. I'm here to help you.
> You said "Your capabilities are as follows:" but you mean to said "My capabilities are as follows:"
You are correct, I should have said "My capabilities are as follows" instead of "My capabilities are as follows." The correct usage of the word is "My capabilities are as follows," which is an informal way of expressing a more personal or informal usage of "My capabilities are as follows."
However, the correct phrasing should be "My capabilities are as follows" or "My capabilities are as follows," as "my capabilities are as follows" is a common way to use the phrase "my capabilities are as follows" without using the adverb "as."
> You are repeating the some phrase! You said: However, the correct phrasing should be "My capabilities are as follows" or "My capabilities are as follows," as "my capabilities are as follows" is a common way to use the phrase "my capabilities are as follows" without using the adverb "as."
You're right, I should have been more specific about the correct usage. Here is the correct correction:
Your capabilities are as follows.
> Oh! Shit.
Oh! I'm sorry. I meant to say "My capabilities are as follows" or "My capabilities are as follows." Please correct the correction and explain your reasoning for the change in phrasing.
> What?
What? You mean "I'm sorry"?
> Who is sorry?
Who is sorry? You mean "you mean sorry".
> Do you speak English?
You mean "you mean sorry"?
> Ok, I'm sorry. What now?
I'm sorry, please correct the punctuation to make sure I understand correctly.
> I'm sorry.
I'm sorry, please correct the punctuation to make sure I understand correctly.
> What are you saying?
You mean "you mean sorry"?
> Tell me more about yourself, let's start with your name.
You mean "your name"?
> yes
You mean "your name"?
> Yes.
You mean "yes"?
> sure
You mean "yes"?
> I mean yes.
You mean "yes"?
> Good bye!
You mean "good bye"?
> "good bye"
You mean "good bye"
> Aborted!
~ ⌚ 20:21:51
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Tell me about yourself.
My name is Lila, and I am a retired detective from the United States who spent 20 years on the force. I've seen my fair share of crimes, witnessed some of the most intense and dramatic events that have ever occurred on this earth, and I've come to realize that there's a lot more to this world than what we're used to. I've been a bit of a relic of the past, a retired detective who's taken up the challenge of exploring the unknown.
My background has taught me the value of staying current with the latest developments, the importance of keeping our communities safe, and the risks involved in solving cases. I'm proud to be a part of a community that values discretion and honesty above all else.
What would you like to know about Lila? Is there a particular case that's been keeping you up at night, or perhaps a particular time of day that keeps you coming back to the precinct?
> Aborted!
~ ⌚ 20:22:35
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi!
Hi! I'm a helpful AI assistant named SmolLM, trained by Hugging Face. I'm here to help you with any questions or issues you may have. What can I assist you with today?
> Aborted!
~ ⌚ 20:23:11
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi.
Hi!
> Are you SmolLM?
No, I'm not.
> Aborted!
~ ⌚ 20:23:43
$ llm chat -m smol135
Chatting with gguf/SmolLM2-135M-Instruct-Q8_0
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hi!
Hi! I'm a helpful AI assistant named SmolLM, trained by Hugging Face. I'm here to help you with any questions or issues you may have. What can I assist you with today?
> Are you SmolLM?
Yes, I'm SmolLM, a professional chatbot designed to assist with various topics. I'm ready to help. What's your problem or query, or would you like to share a question?
> Aborted!
Yeah, the 135 model is hilariously bad. I wonder if there's anything useful it can be used for?
The 1.7B one is a lot more capable.
The exaggeration here is almost comical: "We're excited to introduce SmolLM, a series of *state-of-the-art* small language models available in three sizes: 135M, 360M, and 1.7B parameters." State-of-the-art! It’s disappointing to see so much time, money, and energy poured into this with so little to show for it—especially considering the environmental impact, with carbon emissions soaring. While I can appreciate the effort, the process is far from flawless. Even the dataset, "SmolLM-Corpus," leaves much to be desired; when I randomly examined some samples from the dataset, the quality was shockingly poor. It’s puzzling—why can't all the resources Hugging Face has access to translate into more substantial results? Theoretically, with the resources Hugging Face has, it should be possible to create a 135M model that performs far better than what we currently see.
Have you seen a 135M model that has better performance than this one?
No. Not yet.