Hardware autotuning#
The HardwareOptions and
SubstructSearchConfig knobs that control
batching, threading, and GPU dispatch can have a substantial effect on
throughput, and the best values depend on the GPU model, the size of the
input, and the molecules’ chemistry. The nvmolkit.autotune subpackage
runs a short Optuna study on a representative slice of the workload and
returns a config object you can persist and reuse on the full data set.
Installing optuna#
The autotuner depends on the optional optuna package. optuna is not
required to use the rest of nvMolKit, and it is not installed by the
conda-forge nvMolKit package by default.
Install optuna explicitly:
Pip users:
pip install nvMolKit[autotune] # or pip install optuna
Conda-forge users:
conda install -c conda-forge optuna
Use nvmolkit.autotune.is_available() to detect whether optuna is
importable in the current environment without triggering the import:
from nvmolkit import autotune
if not autotune.is_available():
raise SystemExit("optuna is not installed; see the autotune docs")
Calling any tune_* function without optuna raises an
ImportError whose message contains the same install hints as above.
Quick start: tune ETKDG embedding#
Each tune_* function returns a TuneResult whose
best_config field is the tuned options object — drop it directly into the
matching nvMolKit API:
from rdkit import Chem
from rdkit.Chem.AllChem import ETKDGv3
from nvmolkit import autotune
from nvmolkit.embedMolecules import EmbedMolecules
mols = [Chem.AddHs(Chem.MolFromSmiles(s)) for s in many_smiles]
params = ETKDGv3()
params.useRandomCoords = True
result = autotune.tune_embed_molecules(
mols,
params,
confsPerMolecule=10,
n_trials=30,
target_seconds_per_trial=10.0,
)
print(f"Tuned options: {result.best_config.to_dict()}")
print(f"Throughput: {result.best_throughput:.1f} conformers/s "
f"on {result.calibration_size} molecules over {result.n_trials_run} trials")
# Apply the tuned options to the full workload
EmbedMolecules(mols, params, confsPerMolecule=10, hardwareOptions=result.best_config)
Saving and reloading a tuned config#
Tuning is expensive, so the result is worth caching. The persistence helpers
work even without optuna installed, so a config tuned on one machine can
be loaded on a conda-forge install with no autotune extra:
autotune.save(result.best_config, "etkdg_options.json")
# Later, possibly on a machine without optuna:
options = autotune.load("etkdg_options.json")
EmbedMolecules(mols, params, confsPerMolecule=10, hardwareOptions=options)
The same save/load pair handles
SubstructSearchConfig instances; the type is
encoded in the JSON payload and dispatched on load.
Calibration set sizing and the time budget#
By default the tuner runs each trial on a 10% subsample of the workload,
capped at 2000 molecules. Before the Optuna study starts, a single warm-up
trial runs the default configuration. If that warm-up exceeds twice
target_seconds_per_trial, the calibration slice is shrunk by half and the
warm-up retries (up to three shrinks). This keeps tuning cheap on huge inputs.
Override these knobs explicitly when needed:
result = autotune.tune_embed_molecules(
mols,
params,
calibration_set=[i for i in range(500)], # explicit indices
target_seconds_per_trial=5.0, # tighter budget per trial
n_trials=20,
verbose=True,
)
Tuning per GPU configuration#
The tuner does not search over GPU subsets. Instead, fix gpuIds to the
hardware configuration you want to evaluate and call tune_* for each:
single_gpu = autotune.tune_embed_molecules(mols, params, gpuIds=[0])
multi_gpu = autotune.tune_embed_molecules(mols, params, gpuIds=[0, 1])
Compare best_throughput across runs to pick the deployment configuration.
Other supported APIs#
The following wrappers are available; each returns the appropriate config type:
Tuning a batched forcefield#
Because MMFFBatchedForcefield and
UFFBatchedForcefield accept per-element
constraints and properties, the wrapper takes a factory callable that
rebuilds a fresh forcefield with the trial-specific
HardwareOptions:
from nvmolkit.batchedForcefield import MMFFBatchedForcefield
def factory(mols, hw_options):
ff = MMFFBatchedForcefield(mols, hardwareOptions=hw_options)
for i in range(len(ff)):
ff[i].add_position_constraint(0, 0.1, 50.0)
return ff
result = autotune.tune_batched_forcefield(
mols,
factory,
maxIters=100,
n_trials=20,
)
options = result.best_config