Allow public tuning of cub::DeviceRadixSort#9491
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
1dfb476 to
12cdc11
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
OverviewThis PR exposes the radix sort tuning API publicly, allowing users to customize the sorting algorithm behavior via Key ChangesPublic API ExposureThe core change migrates radix sort policies from internal
Internal RefactoringAgent policy implementations were reorganized under the
All original agent policy names are preserved as deprecated type aliases with API Deprecation
DeviceRadixSort DocumentationAdded a new "Tuning" subsection to Policy Selector UpdatesUpdated Test Additions
Scope
Backward CompatibilityAll public API changes preserve backward compatibility through deprecated type aliases. Existing code using agent policy types will continue to compile with deprecation warnings, guiding migration to the new public tuning API. WalkthroughThe PR promotes internal ChangesRadixSort public policy API promotion
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
cub/cub/device/dispatch/dispatch_radix_sort.cuh (1)
839-845:⚠️ Potential issue | 🟠 Majorimportant: Line 841 passes
alt_downsweep_kernelas the second argument to__invoke_passes, but that position expectsalt_upsweep_kernel(parameter typeUpsweepKernelT). This causes a type contract violation that breaks template instantiation. Passalt_upsweep_kernelinstead. Line 1117-1122 shows the correct argument order.cub/benchmarks/bench/radix_sort/policy_selector.h (1)
81-90:⚠️ Potential issue | 🔴 Criticalimportant: Line 90 uses
onesweep.block_threads, butRadixSortOnesweepPolicyis constructed withthreads_per_blockas the first field (line 18). Changeonesweep.block_threadstoonesweep.threads_per_block.
🧹 Nitpick comments (2)
cub/cub/device/device_radix_sort.cuh (1)
192-193: ⚡ Quick winsuggestion: Line 192 introduces direct use of
RadixSortPolicy, but this header does not include its defining header directly. Add a direct include forcub/device/dispatch/tuning/tuning_radix_sort.cuhto avoid transitive-include fragility.As per coding guidelines, "Files must include all headers related to the symbols that they are using and relying on transitive header inclusion is not allowed."
Source: Coding guidelines
cub/cub/device/dispatch/tuning/tuning_radix_sort.cuh (1)
40-51: ⚡ Quick winsuggestion: Add CCCL host API annotations to the new stream overloads.
These new hosted overloads are plain functions in a CUDA header; annotate them with the project’s host API macro to match CCCL header conventions.
As per coding guidelines, “Functions must be marked with one of:
_CCCL_HOST_API,_CCCL_DEVICE_API,_CCCL_HOST_DEVICE_API,_CCCL_TILE_API, or_CCCL_API.”Also applies to: 79-85, 108-112, 148-156, 187-194, 220-225, 259-266
Source: Coding guidelines
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a97e1853-7d32-4e48-b473-78f0569926ae
📒 Files selected for processing (15)
cub/benchmarks/bench/radix_sort/policy_selector.hcub/cub/agent/agent_radix_sort_downsweep.cuhcub/cub/agent/agent_radix_sort_histogram.cuhcub/cub/agent/agent_radix_sort_onesweep.cuhcub/cub/agent/agent_radix_sort_upsweep.cuhcub/cub/device/device_radix_sort.cuhcub/cub/device/dispatch/dispatch_radix_sort.cuhcub/cub/device/dispatch/kernels/kernel_radix_sort.cuhcub/cub/device/dispatch/kernels/kernel_segmented_radix_sort.cuhcub/cub/device/dispatch/kernels/kernel_segmented_sort.cuhcub/cub/device/dispatch/tuning/tuning_radix_sort.cuhcub/cub/device/dispatch/tuning/tuning_segmented_radix_sort.cuhcub/cub/device/dispatch/tuning/tuning_segmented_sort.cuhcub/test/catch2_test_device_radix_sort_env.cucub/test/catch2_test_device_radix_sort_env_api.cu
🥳 CI Workflow Results🟩 Finished in 3h 01m: Pass: 100%/287 | Total: 7d 07h | Max: 2h 12m | Hits: 36%/616062See results here. |
| int num_private_partitions; | ||
| int radix_bits; | ||
| int radix_bits; //!< Number of bits per radix digit |
There was a problem hiding this comment.
Hm, why do we have the num_ prefix in one case, but not in the other? Seems inconsistent to me
There was a problem hiding this comment.
I'll make a note and thnk about it!
DispatchRadixSortinto a newdispatchfunction: Duplicate radix sort dispatch to deprecateDispatchRadixSort#9530Fixes: #8576