Skip to content

[libcu++] Add tests for cross device APIs and APIs related to a device with a different device set current#9617

Open
pciolkosz wants to merge 2 commits into
NVIDIA:mainfrom
pciolkosz:cccl_rt_multigpu_test_audit
Open

[libcu++] Add tests for cross device APIs and APIs related to a device with a different device set current#9617
pciolkosz wants to merge 2 commits into
NVIDIA:mainfrom
pciolkosz:cccl_rt_multigpu_test_audit

Conversation

@pciolkosz

Copy link
Copy Markdown
Contributor

We test our APIs with empty driver context stack, but it seems sometimes that differs from it being set to a non-matching device. This PR adds a bunch of tests to APIs that take a device or a stream to confirm that if a different device is set current they continue to work.

It also adds tests for using events/streams across devices.

@pciolkosz pciolkosz requested a review from a team as a code owner June 27, 2026 01:06
@pciolkosz pciolkosz requested a review from ericniebler June 27, 2026 01:06
@github-project-automation github-project-automation Bot moved this to Todo in CCCL Jun 27, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 27, 2026
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4c263dc5-385e-48aa-92b4-30d0d86863d3

📥 Commits

Reviewing files that changed from the base of the PR and between c35e017 and 3453af6.

📒 Files selected for processing (10)
  • libcudacxx/test/libcudacxx/cuda/ccclrt/algorithm/copy.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/device/device_smoke.c2h.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/event/event_smoke.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/launch/host_launch.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/launch/launch_smoke.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/stream/stream_smoke.cu
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/constructor.cu
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/copy.cu
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/helper.h
  • libcudacxx/test/libcudacxx/cuda/memory/get_device_address.pass.cpp
🚧 Files skipped from review as they are similar to previous changes (10)
  • libcudacxx/test/libcudacxx/cuda/ccclrt/launch/host_launch.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/launch/launch_smoke.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/stream/stream_smoke.cu
  • libcudacxx/test/libcudacxx/cuda/memory/get_device_address.pass.cpp
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/copy.cu
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/helper.h
  • libcudacxx/test/libcudacxx/cuda/containers/buffer/constructor.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/device/device_smoke.c2h.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/event/event_smoke.cu
  • libcudacxx/test/libcudacxx/cuda/ccclrt/algorithm/copy.cu

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Expanded multi-GPU test coverage across libcu++/CCCLRT to verify device/stream-bound APIs continue to operate correctly when the current CUDA device/context differs from the device used by the API call.

  • Updated cuda::__container::buffer cross-copy helpers to establish/ensure the correct current context (via __ensure_current_context on the relevant stream/device) before launching async cross-device memory operations and before issuing cross-copy waits/copies.
  • Added/extended multi-GPU tests for:
    • cuda::copy_bytes stream-device selection under mismatched current device.
    • Stream behavior and dependency semantics across explicitly selected devices.
    • cuda::event / cuda::timed_event construction on explicit devices and cross-device event waiting.
    • cuda::host_launch and device launch behavior when current device differs from the stream’s device.
    • Device attribute access via device_ref under mismatched current device.
    • cuda::get_device_address with explicit device when the current device differs.
    • Buffer creation and copying (make_device_buffer / make_buffer, plus buffer constructor/copy tests) ensuring allocations use the explicit device even under context mismatch.
  • Enhanced buffer test helpers to use the associated stream for context initialization and added a check_allocation_device helper that validates the allocated buffer resides on the expected device using pointer attributes.

Walkthrough

Adds current-context guards for internal buffer copies and expands multi-GPU tests so explicit-device streams, launches, events, memory, and buffer construction are exercised when current and explicit devices differ.

Changes

Current-context and explicit-device behavior

Layer / File(s) Summary
Buffer helper context
libcudacxx/include/cuda/__container/buffer.h, libcudacxx/test/libcudacxx/cuda/containers/buffer/helper.h
__copy_cross and __copy_cross_buffers now enter the stream’s CUDA context before async copies, and buffer test helpers use stream-based context setup plus allocation-device checks.
Device and stream smoke
libcudacxx/test/libcudacxx/cuda/ccclrt/device/device_smoke.c2h.cu, libcudacxx/test/libcudacxx/cuda/ccclrt/stream/stream_smoke.cu, libcudacxx/test/libcudacxx/cuda/ccclrt/launch/host_launch.cu, libcudacxx/test/libcudacxx/cuda/ccclrt/launch/launch_smoke.cu
Multi-GPU tests now verify device attributes, stream construction, wait semantics, host launch, and kernel launch behavior when the current device differs from the explicit device.
Event and address checks
libcudacxx/test/libcudacxx/cuda/ccclrt/event/event_smoke.cu, libcudacxx/test/libcudacxx/cuda/memory/get_device_address.pass.cpp
Event construction, cross-device waiting, and explicit-device get_device_address behavior are validated under differing current-device contexts.
Buffer copy and construction
libcudacxx/test/libcudacxx/cuda/ccclrt/algorithm/copy.cu, libcudacxx/test/libcudacxx/cuda/containers/buffer/copy.cu, libcudacxx/test/libcudacxx/cuda/containers/buffer/constructor.cu
cuda::copy_bytes, cuda::make_buffer, and cuda::make_device_buffer tests now use explicit-device streams and verify allocation targets and copied contents.

Possibly related PRs

  • suggestion: NVIDIA/cccl#9615 — same buffer copy helper context change around __ensure_current_context before async memcpy.

Suggested reviewers

  • suggestion: wmaxey
  • suggestion: davebayer
  • suggestion: Jacobfaib

Comment @coderabbitai help to get the list of available commands.

@github-actions

This comment has been minimized.

@pciolkosz pciolkosz force-pushed the cccl_rt_multigpu_test_audit branch from c35e017 to 3453af6 Compare June 27, 2026 21:21
@github-actions

Copy link
Copy Markdown
Contributor

😬 CI Workflow Results

🟥 Finished in 1h 18m: Pass: 99%/120 | Total: 1d 14h | Max: 1h 02m | Hits: 99%/353935

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants