1 points | by imalomder 12 hours ago ago
1 comments
Hi HN, this is my research project that allow people to locally deploy MoE Diffusion LLMs more efficiently. With this method, you can fit a 100B LLaDA2.0-flash model into a PC with a RTX5090 and run it faster than other methods.
Hi HN, this is my research project that allow people to locally deploy MoE Diffusion LLMs more efficiently. With this method, you can fit a 100B LLaDA2.0-flash model into a PC with a RTX5090 and run it faster than other methods.