12 comments

  • 10 hours ago
    [deleted]
  • throwaway2027 10 hours ago

    Holy AI Slop

  • ArchitectAI 11 hours ago

    I built a lightweight (93KB) CUDA→AMD translation layer using LD_PRELOAD.

    It intercepts CUDA API calls at runtime and translates them to HIP/rocBLAS/MIOpen.

    No source code needed. No recompilation. Just:

      LD_PRELOAD=./libapex_hip_bridge.so ./your_cuda_app
    
     
    
    Currently supports:

    - 38 CUDA Runtime functions

    - 15+ cuBLAS operations (matrix multiply, etc)

    - 8+ cuDNN operations (convolutions, pooling, batch norm)

    - PyTorch training and inference

    Built in ~10 hours using dlopen/dlsym for dynamic loading. 100% test pass rate.

    The goal: break NVIDIA's CUDA vendor lock-in and make AMD GPUs viable for

    existing CUDA workloads without months of porting effort.

    • bigyabai 11 hours ago

      > ## First Comment (Expand on technical details)

      > Post this as your first comment after submitting:

      lmfao

      • 11 hours ago
        [deleted]
      • 11 hours ago
        [deleted]
  • ArchitectAI 10 hours ago

    [flagged]

  • throwaway2027 10 hours ago

    [flagged]

    • tomhow 10 hours ago

      Please don't give oxygen to trolls. We detached and banned the account. Any time you see this kind of thing, flag the comment, and if you want to be extra-helpful, email us – hn@ycombinator.com.

    • 10 hours ago
      [deleted]
    • 10 hours ago
      [deleted]