Subquadratic Ops Torch Documentation#

subquadratic_ops_torch provides high-performance CUDA kernels for core operations in popular subquadratic attention architectures, such as Hyena, Mamba, etc. These operations include long and short depthwise convolutions, in multiple dimensions. The library contains PyTorch bindings to these optimized kernels which can be used to accelerate models that rely on these operations.

Kernels are primarily exposed as function calls underlying torch.ops, which also provide a lower-level interface as torch.library operators.

Installation#

Please install using pip install subquadratic-ops-torch-cu12 or pip install subquadratic-ops-torch-cu13

Usage#

You can import the library from python:

import subquadratic_ops_torch as subq

Requirements#

CUDA-compatible NVIDIA GPU (Ampere+)
CUDA Toolkit 12.0 or higher
Python 3.11–3.14

Support and Feedback#

Please contact the developers for any issues you might encounter.

Alireza Moradzadeh, amoradzadeh at nvidia.com
Saee Paliwal, saeep at nvidia.com