butina#
- nvmolkit.clustering.butina(
- distance_matrix: AsyncGpuResult | Tensor,
- cutoff: float,
- neighborlist_max_size: int = 64,
- return_centroids: bool = False,
- stream: Stream | None = None,
Perform Butina clustering on a distance matrix.
The Butina algorithm is a deterministic clustering method that groups items based on distance thresholds. It iteratively: 1. Finds the item with the most neighbors within the cutoff distance 2. Forms a cluster with that item and all its neighbors 3. Removes clustered items from consideration 4. Repeats until all items are clustered
- Parameters:
distance_matrix – Square distance matrix of shape (N, N) where N is the number of items. Can be an AsyncGpuResult or torch.Tensor on GPU.
cutoff – Distance threshold for clustering. Items are neighbors if their distance is less than this cutoff.
neighborlist_max_size – Maximum size of the neighborlist used for small cluster optimization. Must be 8, 16, 24, 32, 64, or 128. Larger values allow parallel processing of larger clusters but use more shared memory.
return_centroids – Whether to return centroid indices for each cluster.
stream – CUDA stream to use. If None, uses the current stream.
- Returns:
AsyncGpuResult of shape
(N,)with cluster IDs (cluster 0 is the largest) whenreturn_centroidsis False. Whenreturn_centroidsis True, returns a tuple(clusters, centroids)where centroids is an AsyncGpuResult of shape(num_clusters,)containing the centroid index for each cluster ID.
Note
The distance matrix should be symmetric and have zeros on the diagonal.