Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

(github.com)

68 points | by MediaSquirrel 2 hours ago ago

7 comments

craze3 2 hours ago

Nice! I've been wanting to try local audio fine-tuning. Hopefully it works with music vocals too

LuxBennu an hour ago

I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.

[-]

MediaSquirrel an hour ago

Memory usage increases quadratically with sequence length. Therefore, using shorter sequences during fine-tuning can prevent memory explosions. On my 64GB RAM machine, I'm limited to input sequences of about 2,000 tokens, considering my average output for the fine-tuning task is around 1,000 tokens (~3k tokens total).

yousifa an hour ago

This is super cool, will definitely try it out! Nice work

dsabanin 2 hours ago

Thanks for doing this. Looks interesting, I'm going to check it out soon.

[-]

MediaSquirrel an hour ago

you are welcome! It was a fun side quest

pivoshenko an hour ago

nice!