I'm still kind of surprised that people are targeting edge deployment of MoE models. By definition they optimize for computation cost at the expense of memory efficiency. We generally need the opposite on the edge.
I'm hoping to see more work in the other direction with cyclic/looped transformers and other memory dense approaches.
For those too lazy to watch someone talk on video for ages to make a point:
The link is to a famous YouTuber called PewDiePie and he uses a local LLM to parse his email, to save time with that. They have an autoreply system and get notified about urgent matters.
I'm still kind of surprised that people are targeting edge deployment of MoE models. By definition they optimize for computation cost at the expense of memory efficiency. We generally need the opposite on the edge.
I'm hoping to see more work in the other direction with cyclic/looped transformers and other memory dense approaches.
You've likely heard about this - he'd probably like to talk to you and might potentially give you some good PR.
https://www.youtube.com/watch?v=rAzT5lcezPs&t=467s
For those too lazy to watch someone talk on video for ages to make a point:
The link is to a famous YouTuber called PewDiePie and he uses a local LLM to parse his email, to save time with that. They have an autoreply system and get notified about urgent matters.
Thanks for sharing! I'd love to chat with him. Would you be open to introducing us? :)