• Naz@sh.itjust.works
      link
      fedilink
      arrow-up
      21
      arrow-down
      2
      ·
      edit-2
      3 months ago

      I’m an AI Developer.

      TLDR: CUDA.

      Getting ROCM to work properly is like herding cats.

      You need a custom implementation for the specific operating system, the driver version must be locked and compatible, especially with a Workstation / WRX card, the Pro drivers are especially prone to breaking, you need the specific dependencies to be compiled for your variant of HIPBlas, or zLUDA, if that doesn’t work, you need ONNX transition graphs, but then find out PyTorch doesn’t support ONNX unless it’s 1.2.0 which breaks another dependency of X-Transformers, which then breaks because the version of HIPBlas is incompatible with that older version of Python and …

      Inhales

      And THEN MAYBE it’ll work at 85% of the speed of CUDA. If it doesn’t crash first due to an arbitrary error such as CUDA_UNIMPEMENTED_FUNCTION_HALF

      You get the picture. On Nvidia, it’s click, open, CUDA working? Yes?, done. You don’t spend 120 hours fucking around and recompiling for your specific usecase.

    • ShrimpCurler@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      4
      ·
      3 months ago

      I think it’s in the pipeline. AMD has bought Xilinx, which builds FPGAs and already had some AI specific cores in their processors. I believe they’re developing that further and integrating it in their GPUs now.