xAI Is Reportedly Using Just 11% of Its 550k Nvidia GPUs

(wccftech.com)

25 points | by lossolo a day ago ago

15 comments

  • thatguysaguy 4 hours ago

    It looks like people in this thread are confusing fleet utilization and MFU. If they're doing a lot of RL, it's really not surprising to see such low numbers.

  • maxrev17 13 hours ago

    Why is everyone stating conjecture based answers like my 10 year old kid with absolutely no evidence to back it?

    • brazukadev 9 hours ago

      The most childish reply to this thread is yours, max.

  • aggakake 21 hours ago

    Aren't Xai's datacenters powered by [currently very expensive] diesel?

  • dlcarrier 17 hours ago

    That's a problem that any general purpose design has. It's something Dojo would have fixed, but it went too far in the other direction and only supported training. Rumor has it the new version will support inference too.

  • Frannky 16 hours ago

    Grok is pretty bad. No wonder usage is low. I think they messed up when they removed the human annotation team and went in the direction of automation.

    The bet can eventually pay off when they figure out how to train without human help and also generate useful models. Imagine is terrible too.

    More competition is great for us users. I hope they recover. In the meantime why not hosting oss models like google does?

    • nwah1 15 hours ago

      My understanding is that inference (running existing models) is around 1/4th of the average compute budget for AI companies. Training new models takes up about 3/4ths.

      As such, using only 11% of their GPUs indicates that they've elected not to do as much training as they are capable of.

      • brazukadev 8 hours ago

        if they "elected" to do that, with such a terrible model, they are the most incompetent AI lab ever.

  • londons_explore 16 hours ago

    Part of this is a human problem. The company wants better utilisation, so hires resourcing experts tasked to allocate resources between projects and teams.

    These experts set up quota systems, priority allocation, month-ahead plans, burst and idle quotas, etc, all with a goal to get the resource better used.

    However it ends up having the reverse effect - teams now waste the resource deliberately to make it appear they have better utilisation, and run pointless jobs because "use it or lose it" quota systems discourage being thrifty.

    These problems are compounded by there being hundreds of resource types - "I've got plenty of CPU and GPU TFlops for my project, but I've run out of disk spindle hours so can't run the training job".

    End result is that the company as a whole doesn't even know real utilisation, and makes exceptionally poor use of resources.

  • grosswait an hour ago

    “Elon Musk's xAI, the software firm behind Gorq” - this is not an autocorrect error.

  • alexdumny a day ago

    This is the exect information I am looking for

    • downrightmike a day ago

      Soon the market will flood with liquidations of everything from these

      • blourvim a day ago

        Article says,this is a software issue. Where GPU'S are unable to get to be fully utilized due to scaling issues. I dont know how hardware that scale works, but it could very well be that they still need all of their hardware to get their current compute

        • brazukadev 21 hours ago

          If they had the demand, this problem would be fixed. Even giving free credits xai would not get the users, nobody wants to use Elon's LLM.

          That's why he bought Cursor, trying to get the customers to have an audience to give free credits.

      • kelseyfrog a day ago

        Where does one go (virtually or physically) to participate as a buyer in these markets?