SmolLM2

(simonwillison.net)

110 points | by edward 8 months ago ago

50 comments

Semi-related but is there a standard way to run this (or other models from huggingface) in a docker container and interact with them through a web API? ChatGPT tells me to write my own FastAPI wrapper which should work, but is there no pre-made solution for this?

[-]

turblety 8 months ago

Ollama has built in support [1] for gguf models on huggingface, and exposes a openai compatible http endpoint [2].

You can also just test it out using the cli:

ollama run hf.co/unsloth/SmolLM2-1.7B-Instruct-GGUF:F16

1. https://huggingface.co/docs/hub/ollama

2. https://github.com/ollama/ollama?tab=readme-ov-file#start-ol...

[-]

echoangle 8 months ago

Thanks, Ollama ist exactly what I was looking for.

pizza 8 months ago

In shell 1:

  $ docker run --runtime nvidia --gpus all \
      -v ~/.cache/huggingface:/root/.cache/huggingface \
      --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
      -p 8000:8000 \
      --ipc=host \
      vllm/vllm-openai:latest \
      --model mistralai/Mistral-7B-v0.1

In shell 2:

  $ curl http://localhost:8000/v1/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "mistralai/Mistral-7B-v0.1",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
      }'

[-]

Tostino 8 months ago

Love vLLM for how fast it is while also being easy to host.

ttyprintk 8 months ago

Huggingface TGI supports many models and more than one API:

https://huggingface.co/docs/text-generation-inference/en/ins...

exe34 8 months ago

llama.cpp in a docker container (Google for the gguf version)

kgeist 8 months ago

Does it support anything other than English? Sadly, most open-weights models have no support for languages other than English, which makes them useless for 75% world's population who don't speak English at all.

Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?

[-]

EugeneOZ 8 months ago

> SmolLM2 models primarily understand and generate content in English.

https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct/b...

Gathering6678 8 months ago

I occasionally use the 1.5B and 3B version of Qwen2.5 for translation between English, Chinese and Japanese, of which they seem to do a good job.

LoganDark 8 months ago

> Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?

RWKV World exists (100+ languages on purpose), though it's a bit different from traditional LLMs so YMMV.

nmstoker 8 months ago

Good point raising just how many don't speak any English. That sounds like a lot of people who could do something to produce/ contribute to a non-English language language model.

jdthedisciple 8 months ago

Very interesting. According to their X posts, this meme model "SmolLm" beats Meta's new 1B and 3B models across almost all metrics.

I wonder how this is possible given that Meta has been in this game for much longer and probably has much more data at their disposal as well.

[-]

stavros 8 months ago

Usually, that's because they use a groundbreaking ML method called TTDS, or "training on the test dataset".

[-]

jerpint 8 months ago

Do you have proof for this? Why accuse one team and not the other?

[-]

stavros 8 months ago

Which team did I accuse? I said "usually".

[-]

8 months ago

[deleted]

jgalt212 8 months ago

I would hope Simon would not fall victim to such shenanigans, and has his own test dataset.

[-]

simonw 8 months ago

My test dataset is mostly dumb prompts about pelicans.

You'll note that I didn't quote their benchmarks in my own post at all, because I didn't want to boost them without feeling confident in what they were stating.

I posted about this because my own very limited initial experiments passed a loose vibe check!

I'm impressed any time a 1.7GB (or 130MB) model file appears to be able to do anything useful at all.

stavros 8 months ago

I'm not saying either did this, just that that's what most fine tunes tend to do.

exe34 8 months ago

meta or smol?

[-]

stavros 8 months ago

Smol

wokwokwok 8 months ago

You can be reasonably confident that unless there’s been a significant breakthrough (there hasn’t) if a smaller model beats a larger model it’s either fine tuned for a specific purpose or trained on the test data somehow (ie. fine tuned to have good metrics).

To be less snarky, they claim:

> These models are built on a meticulously curated high-quality training corpus

Ie. Good training data plus a small model beats a bigger model.

…but I’m skeptical, when I read:

> We observed that performance continues to improve with longer training, even beyond the Chinchilla optimal point. Therefore, we decided to train the 1.7B model on 1 trillion tokens and the 135M and 360M models on 600B tokens, as the performance gains after 400B tokens begin to slow on some benchmarks for these smaller models.

So they’re evaluating their models against various benchmarks as they train them and picking the practice that gives the best benchmarks?

I dunno.

The claim is basically good data > more parameters, but it’s just an observation of “this happened to work for us” rather than something you can usefully take (as far as I can see) and apply to larger models.

The claims they actaully make about performance are far more modest than people are making out.

The 1.7B model performs better than any other 2B models in their evaluation.

Seems nice. Not ground breaking. Not convinced it’s real rather than polluted training data personally.

thinker567 8 months ago

They didn’t say they beat the 3B, their 1.7B outperforms llama3.2 1B thanks to 11T tokens of high quality data (Hugging Face are the ones behind FineWeb dataset that everyone uses now). Btw Qwen2.5-1B also surpasses llama3.2 1B by a large margin so beating it is even more impressive

moffkalast 8 months ago

Meta doesn't train on their internal data, at least not for open models. It would be a real PR problem if someone started dumping real Facebook chats out of them.

And this is from Huggingface themselves, arguably they have a lot of data as well.

cpa 8 months ago

What’s the context size? I couldn’t find it on the model summary page. Tangential: if it’s not on the model page, does it mean that it’s not that relevant here? If so, why?

[-]

cloudbonsai 8 months ago

> What’s the context size?

SmolLM2 uses up to 8192 tokens.

oulipo 8 months ago

Nice! Do you think they could be fine-tuned to implement a cool thing like https://withaqua.com/ ? eg to teach it to do "inline edits" of what you say?

diimdeep 8 months ago

Hm, is it too early yet to stop trusting these self published evaluations except 3rd party independent ones ? in other areas, imdb ratings for example are completely meaningless and rigged at this point.

globular-toast 8 months ago

Why would I care about this when I can have the entire English Wikipedia on my phone? Really struggling to understand why people are so excited about this stuff.

[-]

terramoto 8 months ago

Knowledge doesnt amount to much on the LLM, i think what most are excited about is the artificial reasoning.

[-]

globular-toast 8 months ago

What is left for the brain to do? First people let their bodies atrophy. Next it's the mind. Wall-E here we come.

[-]

skeledrew 8 months ago

There will always be lots to do for those who are motivated to find such things. While the objective value of a thing may change greatly, the subjective value can be kept fairly constant. Just think about all the retro-x enthusiasts.

nurettin 8 months ago

LLMs have the potential to eliminate a lot of rudimentary tasks. The brain should find better things to do than locating the closest number in two lists. It isn't all doom and gloom.

8 months ago

[deleted]

tetris11 8 months ago

"Hey Wikipedia, at what year did SSD speeds reach RAM speeds of yester-year?"

ksri 8 months ago

Is there a way to run this in the browser as yet? Transformers js doesn't seem to support this. Is there another way to run this in the browser?

[-]

d_k_f 8 months ago

They linked two examples in another blog post, only the smaller models, though:

[135M] https://huggingface.co/spaces/HuggingFaceTB/SmolLM-135M-Inst...

[360M] https://huggingface.co/spaces/HuggingFaceTB/SmolLM-360M-Inst...

adhamsalama 8 months ago

Maybe WebAssembly?

Nowado 8 months ago

Probably ONNX.

Its_Padar 8 months ago

I wonder how one would finetune this

forrestthewoods 8 months ago

Is there a good, small model that can take input images? Or are those all still larger?

[-]

JimDabell 8 months ago

I haven’t tried the smaller variants, but I’ve been very impressed with Molmo:

https://molmo.allenai.org

ekianjo 8 months ago

moondream fits the bill, but dont expect too much for the performance on image description and all.

sahli 8 months ago

I see no difference between SmoLM1 and SmolLM2 (at least for 135). See how the model can't even generalize over "Hi." and "Hi!" (the latest conversations). Isn't this a sign of overfitting/Memorization? A sign of poor training?

- SmolLM2 does not maintain a consistent identity.

- There is a lot of repetition. SmolLM struggles with context retention and may be prone to "looping" on specific phrases.

- SmolLM2 misinterprets the user’s intent in correcting its responses.

- Random Responses.

- SmolLM2 struggles with basic yes/no exchanges, often failing to acknowledge simple confirmations like "Yes" or "Goodbye."

$ llm chat -m smol135

Chatting with gguf/SmolLM2-135M-Instruct-Q8_0

Type 'exit' or 'quit' to exit

Type '!multi' to enter multiple lines, then '!end' to finish

> Hi.

Hi!

> Who are you?

You are a chatbot. I don't understand why you're asking me about me.

> Who you are?

You are a chatbot created by Hugging Face. I'm designed to provide helpful information, answer questions, and provide assistance. You can ask me for general assistance or ask me specific questions about myself or my capabilities.

> Ok, what's your capabilities?

Your capabilities are as follows:

1. Assist users [...]

Feel free to ask any questions about yourself or my capabilities. I'm here to help you.

> You said "Your capabilities are as follows:" but you mean to said "My capabilities are as follows:"

You are correct, I should have said "My capabilities are as follows" instead of "My capabilities are as follows." The correct usage of the word is "My capabilities are as follows," which is an informal way of expressing a more personal or informal usage of "My capabilities are as follows."

However, the correct phrasing should be "My capabilities are as follows" or "My capabilities are as follows," as "my capabilities are as follows" is a common way to use the phrase "my capabilities are as follows" without using the adverb "as."

> You are repeating the some phrase! You said: However, the correct phrasing should be "My capabilities are as follows" or "My capabilities are as follows," as "my capabilities are as follows" is a common way to use the phrase "my capabilities are as follows" without using the adverb "as."

You're right, I should have been more specific about the correct usage. Here is the correct correction:

Your capabilities are as follows.

> Oh! Shit.

Oh! I'm sorry. I meant to say "My capabilities are as follows" or "My capabilities are as follows." Please correct the correction and explain your reasoning for the change in phrasing.

> What?

What? You mean "I'm sorry"?

> Who is sorry?

Who is sorry? You mean "you mean sorry".

> Do you speak English?

You mean "you mean sorry"?

> Ok, I'm sorry. What now?

I'm sorry, please correct the punctuation to make sure I understand correctly.

> I'm sorry.

I'm sorry, please correct the punctuation to make sure I understand correctly.

> What are you saying?

You mean "you mean sorry"?

> Tell me more about yourself, let's start with your name.

You mean "your name"?

> yes

You mean "your name"?

> Yes.

You mean "yes"?

> sure

You mean "yes"?

> I mean yes.

You mean "yes"?

> Good bye!

You mean "good bye"?

> "good bye"

You mean "good bye"

> Aborted!

~ ⌚ 20:21:51

$ llm chat -m smol135

Chatting with gguf/SmolLM2-135M-Instruct-Q8_0

Type 'exit' or 'quit' to exit

Type '!multi' to enter multiple lines, then '!end' to finish

> Tell me about yourself.

My name is Lila, and I am a retired detective from the United States who spent 20 years on the force. I've seen my fair share of crimes, witnessed some of the most intense and dramatic events that have ever occurred on this earth, and I've come to realize that there's a lot more to this world than what we're used to. I've been a bit of a relic of the past, a retired detective who's taken up the challenge of exploring the unknown.

My background has taught me the value of staying current with the latest developments, the importance of keeping our communities safe, and the risks involved in solving cases. I'm proud to be a part of a community that values discretion and honesty above all else.

What would you like to know about Lila? Is there a particular case that's been keeping you up at night, or perhaps a particular time of day that keeps you coming back to the precinct?

> Aborted!

~ ⌚ 20:22:35

$ llm chat -m smol135

Chatting with gguf/SmolLM2-135M-Instruct-Q8_0

Type 'exit' or 'quit' to exit

Type '!multi' to enter multiple lines, then '!end' to finish

> Hi!

Hi! I'm a helpful AI assistant named SmolLM, trained by Hugging Face. I'm here to help you with any questions or issues you may have. What can I assist you with today?

> Aborted!

~ ⌚ 20:23:11

$ llm chat -m smol135

Chatting with gguf/SmolLM2-135M-Instruct-Q8_0

Type 'exit' or 'quit' to exit

Type '!multi' to enter multiple lines, then '!end' to finish

> Hi.

Hi!

> Are you SmolLM?

No, I'm not.

> Aborted!

~ ⌚ 20:23:43

$ llm chat -m smol135

Chatting with gguf/SmolLM2-135M-Instruct-Q8_0

Type 'exit' or 'quit' to exit

Type '!multi' to enter multiple lines, then '!end' to finish

> Hi!

Hi! I'm a helpful AI assistant named SmolLM, trained by Hugging Face. I'm here to help you with any questions or issues you may have. What can I assist you with today?

> Are you SmolLM?

Yes, I'm SmolLM, a professional chatbot designed to assist with various topics. I'm ready to help. What's your problem or query, or would you like to share a question?

> Aborted!

[-]

simonw 8 months ago

Yeah, the 135 model is hilariously bad. I wonder if there's anything useful it can be used for?

The 1.7B one is a lot more capable.

[-]

sahli 8 months ago

The exaggeration here is almost comical: "We're excited to introduce SmolLM, a series of *state-of-the-art* small language models available in three sizes: 135M, 360M, and 1.7B parameters." State-of-the-art! It’s disappointing to see so much time, money, and energy poured into this with so little to show for it—especially considering the environmental impact, with carbon emissions soaring. While I can appreciate the effort, the process is far from flawless. Even the dataset, "SmolLM-Corpus," leaves much to be desired; when I randomly examined some samples from the dataset, the quality was shockingly poor. It’s puzzling—why can't all the resources Hugging Face has access to translate into more substantial results? Theoretically, with the resources Hugging Face has, it should be possible to create a 135M model that performs far better than what we currently see.

[-]

simonw 8 months ago

Have you seen a 135M model that has better performance than this one?

[-]

sahli 8 months ago

No. Not yet.

8 months ago

[deleted]