butina#

nvmolkit.clustering.butina(
distance_matrix: AsyncGpuResult | Tensor,
cutoff: float,
neighborlist_max_size: int = 64,
) AsyncGpuResult#

Perform Butina clustering on a distance matrix.

The Butina algorithm is a deterministic clustering method that groups items based on distance thresholds. It iteratively: 1. Finds the item with the most neighbors within the cutoff distance 2. Forms a cluster with that item and all its neighbors 3. Removes clustered items from consideration 4. Repeats until all items are clustered

Parameters:
  • distance_matrix – Square distance matrix of shape (N, N) where N is the number of items. Can be an AsyncGpuResult or torch.Tensor on GPU.

  • cutoff – Distance threshold for clustering. Items are neighbors if their distance is less than this cutoff.

  • neighborlist_max_size – Maximum size of the neighborlist used for small cluster optimization. Must be 8, 16, 24, 32, 64, or 128. Larger values allow parallel processing of larger clusters but use more shared memory.

Returns:

AsyncGpuResult containing cluster assignments as integers. Each element i contains the cluster ID for item i. Cluster IDs are sequential integers starting from 0, with cluster 0 being the largest.

Note

The distance matrix should be symmetric and have zeros on the diagonal.