Semi-related but is there a standard way to run this (or other models from huggingface) in a docker container and interact with them through a web API?
ChatGPT tells me to write my own FastAPI wrapper which should work, but is there no pre-made solution for this?
What’s the context size? I couldn’t find it on the model summary page.
Tangential: if it’s not on the model page, does it mean that it’s not that relevant here? If so, why?
Does it support anything other than English? Sadly, most open-weights models have no support for languages other than English, which makes them useless for 75% world's population who don't speak English at all.
Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?
Nice! Do you think they could be fine-tuned to implement a cool thing like https://withaqua.com/ ? eg to teach it to do "inline edits" of what you say?
Semi-related but is there a standard way to run this (or other models from huggingface) in a docker container and interact with them through a web API? ChatGPT tells me to write my own FastAPI wrapper which should work, but is there no pre-made solution for this?
Ollama has built in support [1] for gguf models on huggingface, and exposes a openai compatible http endpoint [2].
You can also just test it out using the cli:
ollama run hf.co/unsloth/SmolLM2-1.7B-Instruct-GGUF:F16
1. https://huggingface.co/docs/hub/ollama
2. https://github.com/ollama/ollama?tab=readme-ov-file#start-ol...
llama.cpp in a docker container (Google for the gguf version)
Very interesting. According to their X posts, this meme model "SmolLm" beats Meta's new 1B and 3B models across almost all metrics.
I wonder how this is possible given that Meta has been in this game for much longer and probably has much more data at their disposal as well.
Usually, that's because they use a groundbreaking ML method called TTDS, or "training on the test dataset".
meta or smol?
Smol
What’s the context size? I couldn’t find it on the model summary page. Tangential: if it’s not on the model page, does it mean that it’s not that relevant here? If so, why?
> What’s the context size?
SmolLM2 uses up to 8192 tokens.
Does it support anything other than English? Sadly, most open-weights models have no support for languages other than English, which makes them useless for 75% world's population who don't speak English at all.
Does anyone know of a good lightweight open-weights LLM which supports at least a few major languages (let's say, the official UN languages at least)?
> SmolLM2 models primarily understand and generate content in English.
https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct/b...
Nice! Do you think they could be fine-tuned to implement a cool thing like https://withaqua.com/ ? eg to teach it to do "inline edits" of what you say?
Is there a way to run this in the browser as yet? Transformers js doesn't seem to support this. Is there another way to run this in the browser?
They linked two examples in another blog post, only the smaller models, though:
[135M] https://huggingface.co/spaces/HuggingFaceTB/SmolLM-135M-Inst...
[360M] https://huggingface.co/spaces/HuggingFaceTB/SmolLM-360M-Inst...
Probably ONNX.
Maybe WebAssembly?
I wonder how one would finetune this
Is there a good, small model that can take input images? Or are those all still larger?
I haven’t tried the smaller variants, but I’ve been very impressed with Molmo:
https://molmo.allenai.org