Add quant+sparse attention for vLLM serving#1832
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1832 +/- ##
==========================================
- Coverage 77.40% 77.04% -0.36%
==========================================
Files 515 517 +2
Lines 57118 57373 +255
==========================================
- Hits 44214 44205 -9
- Misses 12904 13168 +264
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
d48b1df to
6020692
Compare
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
… (V) Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
… and in-kernel-V gate Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
6020692 to
5942c96
Compare
Signed-off-by: Kai Xu <kaix@nvidia.com>
…M2 quantizers Signed-off-by: Kai Xu <kaix@nvidia.com>
d107a21 to
1910e45
Compare
Signed-off-by: Kai Xu <kaix@nvidia.com>
What does this PR do?
Type of change: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Kernel vs PyTorch native SDPA — A6000, fp16, batch=2, head_dim=128:
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information