2 comments

  • bobbyzhu2008 13 hours ago

    67% less kernel code is the more interesting number here — Hopper's async capabilities have been underutilized largely because the programming model is painful. Curious how it handles cases where compute and memory phases aren't cleanly separable.

  • jhap 11 hours ago

    This seems like a better version of CUDA, for Hopper GPUs?