Memory-efficient DoRA (Weight-Decomposed Low-Rank Adaptation) for PEFT, featuring factored column norms, fused Triton kernels with custom autograd, and automatic dispatch across eager PyTorch and Triton backends.

From the paper: Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels (arXiv TBD).

Modules

Module Description Reference
peft.tuners.lora.dora Layer classes (DoraLinearLayer, DoraEmbeddingLayer, conv variants), configuration functions, FSDP/ZeRO-3 integration, and eager composition helpers Layer Classes, Configuration
peft.tuners.lora.dora_fused Fused Triton kernels for DoRA compose, norm assembly, and forward+inner products; custom autograd function; PyTorch fallbacks; autotune configs Fused Kernels

Environment Variables

Variable Default Description
PEFT_DORA_FUSED unset (auto) Enable fused Triton kernels: "1", "0", or unset (auto: use if Triton available)
PEFT_DORA_FUSED_BACKWARD unset (on) Fused backward pass: "1" (force on, bypass shape heuristic), "0" (off), or unset (on, with shape-based filtering for linear layers)
PEFT_DORA_NORM_CHUNK_MB 256 Column-norm chunking threshold in MB; matrices exceeding this are chunked (min 16)
PEFT_DORA_FWD_CHUNK_MB 256 Forward-pass chunking threshold in MB (min 16)
DORA_AUTOTUNE_COMPREHENSIVE "0" Enable comprehensive Triton autotuning ("1" for full search)
PEFT_DORA_ALLOW_PARTIAL_GATHER "0" Allow partial parameter gathering under ZeRO-3 ("1" to enable)
PEFT_FORCE_GATHER unset (auto) Force full parameter gathering: "1", "0", or unset (auto-detect)

Module Relationships

dora.py is the primary module: it defines all layer classes and configuration functions. It lazy-imports dora_fused.py on first use (guarded by _get_dora_fused()) so that Triton is not required at import time.

Citation

@article{zelenin2026dorafactors,
  title   = {Scaling DoRA: High-Rank Adaptation via Factored Norms and Fused Kernels},
  author  = {Zelenin, Alexandra and Zhuravlyova, Alexandra},
  journal = {arXiv preprint arXiv:XXXX.XXXXX},
  year    = {2026}
}