Skip to content

Investigate potential speedup by a compile-time alignment guarantee for DeviceTransform #5067

Description

@bernhardmgruber

cub::DeviceTransform jumps through hoops to handle unaligned inputs, especially in the memcpy_async (LDGSTS) and ublkcp kernel. If the user gave us a compile-time guarantee that all buffers are aligned, the kernel could be simplified.

Let's investigate the performance impact of such a guarantee.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions