Add cuda::bit_ffs to <cuda/bit>#9609
Conversation
Implements a type-safe find-first-set that returns 1 + the index of the least significant set bit, or 0 for a zero input, matching __builtin_ffs and CUDA's __ffs (unlike countr_zero). Adds tests and docs. Closes NVIDIA#6108 Signed-off-by: temujinkz <ttalkenov@gmail.com>
| constexpr T all_ones = static_cast<T>(~T{0}); | ||
|
|
||
| // a zero input is well defined and returns 0 | ||
| static_assert(cuda::bit_ffs(T{0}) == 0); |
There was a problem hiding this comment.
Important: All those should only be
| static_assert(cuda::bit_ffs(T{0}) == 0); | |
| assert(cuda::bit_ffs(T{0}) == 0); |
We are invoking the function in a constexpr context, so otherwise we would never test the runtime path
| TEST_FUNC constexpr bool test() | ||
| { | ||
| using nl = cuda::std::numeric_limits<T>; | ||
| constexpr T all_ones = static_cast<T>(~T{0}); |
There was a problem hiding this comment.
| constexpr T all_ones = static_cast<T>(~T{0}); | |
| [[maybe_unused]] constexpr T all_ones = static_cast<T>(~T{0}); |
|
|
||
| unused(all_ones); |
There was a problem hiding this comment.
| unused(all_ones); |
| static_assert(cuda::bit_ffs(static_cast<T>(0b10101000)) == 4); | ||
|
|
||
| unused(all_ones); | ||
| return true; |
There was a problem hiding this comment.
Nitpick: This function is only called from the test() function, so it does not need to return anything
| #if _CCCL_HAS_INT128() | ||
| test<__uint128_t>(); | ||
| #endif // _CCCL_HAS_INT128() | ||
| return true; |
There was a problem hiding this comment.
Please add a test or two that we cannot invoke the function with bool, a signed integer or float and an enumeration
something like:
template<typename T>
_CCCL_CONCEPT can_bit_ffs = _CCCL_REQUIRES_EXPR((T), T val)((cuda::bit_ffs(val));|
Thanks a lot for the PR, This already looks great, I have some nits |
| __global__ void bit_ffs_kernel() { | ||
| assert(cuda::bit_ffs(uint32_t{0}) == 0); | ||
| assert(cuda::bit_ffs(uint32_t{1}) == 1); | ||
| assert(cuda::bit_ffs(uint32_t{0b10101000}) == 4); |
There was a problem hiding this comment.
I would also add an example for all bit set
| _CCCL_REQUIRES(::cuda::std::__cccl_is_unsigned_integer_v<_Tp>) | ||
| [[nodiscard]] _CCCL_API constexpr int bit_ffs(_Tp __value) noexcept | ||
| { | ||
| return (__value == _Tp{0}) ? 0 : ::cuda::std::countr_zero(__value) + 1; |
There was a problem hiding this comment.
the implementation can be optimized. please make sure that the implementation produces the same code of __ffs and __ffsll https://godbolt.org/z/z4717fqGx
Review feedback from @miscco and @fbusato on NVIDIA#9609: - route through __builtin_ffs / __builtin_ffsll on host and __ffs / __ffsll on device (constexpr fallback) so codegen matches __ffs / __ffsll - test: use assert instead of static_assert so the runtime path is exercised, mark the per-type helper void, [[maybe_unused]] the constant - test: add a concept check that bit_ffs rejects bool, signed, float, and enums - docs: add an all-bits-set example Signed-off-by: temujinkz <ttalkenov@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Added Also:
Walkthroughsuggestion: Adds ChangesBit FFS API
Suggested labels
Suggested reviewers
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8617fc92-34c6-45a3-b206-feac28bfbaf5
📒 Files selected for processing (5)
docs/libcudacxx/extended_api/bit.rstdocs/libcudacxx/extended_api/bit/bit_ffs.rstlibcudacxx/include/cuda/__bit/bit_ffs.hlibcudacxx/include/cuda/bitlibcudacxx/test/libcudacxx/cuda/bit/bit_ffs.pass.cpp
|
Thanks for the fast review, really appreciate it. Pushed an update that should cover everything:
Let me know if the codegen looks right on your side, happy to tweak anything. |
Pre-empt the same review feedback as bit_ffs (NVIDIA#9609): - implement bit_msb by forwarding to cuda::std::bit_width, which already lowers to the optimal clz based code - test: use assert instead of static_assert, per-type helper returns void, [[maybe_unused]] the constant, and add a concept check rejecting bool, signed, float, and enums - docs: add an all-bits-set example Signed-off-by: temujinkz <ttalkenov@gmail.com>
Per the IWYU coding guideline (and CodeRabbit), the test used numeric_limits and size_t through transitive includes. Add the direct headers. Signed-off-by: temujinkz <ttalkenov@gmail.com>
|
Good catch, fixed. Added |
cuda::bit_ffs(x) returns the 1-based position of the lowest set bit, or 0 if x is zero. The body is x == 0 ? 0 : countr_zero(x) + 1. countr_zero counts trailing zeros, so adding 1 turns it into a 1-based index. The x == 0 guard is the one case countr_zero doesn't define cleanly, and handling it is the whole reason this wrapper exists. It matches __builtin_ffs and CUDA's __ffs. It's constrained to unsigned integers, works on all widths including 128-bit, and is constexpr plus host/device because it just forwards to countr_zero.