Introduce `cuda::ptx::prefetch()` to use in CUB BlockLoad so we don't have to use raw assembly since no equivalent facility exists
Introduce
cuda::ptx::prefetch()to use in CUB BlockLoad so we don't have to use raw assembly since no equivalent facility exists