FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

Highlights

What FlowLet does

A single-stage generative framework that brings Flow Matching into a fixed, invertible wavelet domain: fast to sample, controllable, and anatomically faithful.

Wavelet flow matching

Generative modeling directly in the 3D Haar wavelet domain — multi-scale, with no learned latent compression.

Age conditioning

Complementary FiLM modulation and spatially adaptive cross-attention give explicit control over localized, age-related morphology.

Fast deterministic sampling

Deterministic ODE sampling generates high-quality volumes in just a few steps.

Region- & task-aware eval

Evaluation across 95 cortical/subcortical regions and a downstream brain-age prediction study, beyond global metrics.

Open & reproducible

Methodology, code, and evaluation protocols are released free and open-source.

Accessible to train

A ~1B-parameter 3D U-Net that trains within 24 GB of VRAM.

Abstract

Generative modeling for 3D brain MRI is challenged by a trade-off between anatomical fidelity, sample diversity, and computational efficiency. Diffusion-based approaches achieve strong visual quality but typically require hundreds to thousands of sampling steps, while latent-space compression can introduce reconstruction artifacts and degrade fine-grained anatomy. We introduce FlowLet, a conditional generative framework that performs Flow Matching in an invertible 3D wavelet domain. This representation enables multi-scale generation without learned latent compression, while deterministic ODE sampling allows fast inference. Age conditioning is modeled through complementary feature-wise modulation and spatially adaptive cross-attention, enabling explicit control over age-related morphological variation. Across multi-site neuroimaging datasets, FlowLet achieves competitive and, in several settings, superior global fidelity compared to diffusion-based baselines using as few as 10 sampling steps. Region-based evaluation across 95 cortical and subcortical brain regions demonstrates improved local anatomical plausibility beyond what is captured by global similarity metrics alone. In a downstream brain age prediction study, models augmented with FlowLet-generated data consistently reduce prediction error relative to real-only training and other generative baselines. The proposed framework is released as open-source to support reproducibility.

The challenge

Why 3D brain MRI synthesis is hard

Brain-age prediction needs large, diverse, age-balanced cohorts — yet public 3D MRI datasets are demographically skewed, and existing generators force a hard trade-off.

The generative trilemma

Sample quality, diversity, and sampling efficiency pull against each other, improving one usually degrades another.

Fidelity vs. compression

Latent compression speeds things up but can blur fine-grained anatomy that age-related analysis depends on.

Age imbalance

Young and middle-aged adults dominate; pediatric and elderly groups are under-sampled, biasing downstream models.

Overlaid age histograms for the OpenBHB, OASIS-3 and ADNI datasets showing a strong young-adult peak from OpenBHB and older-adult coverage from OASIS-3 and ADNI. — **Age distribution across the integrated cohort.** OpenBHB concentrates younger adults, while OASIS-3 and ADNI enrich the 60–95 range, together spanning the lifespan but remaining imbalanced.

The method

Flow matching in an invertible wavelet domain

A single-stage pipeline: decompose the volume with an invertible 3D Haar transform, learn a velocity field that transports Gaussian noise to data in wavelet space, then reconstruct with the inverse transform.

Step 1 · Decompose

3D Haar DWT

Each volume is split into one low-frequency subband (coarse anatomy) and seven high-frequency subbands (fine detail), lossless and learning-free.

in real 3D MRI → out 8 subbands

Step 2 · Transport

Velocity U-Net v_θ

A conditional 3D U-Net predicts the flow-matching velocity field. Age is injected via FiLM and spatial cross-attention. Sampled by a deterministic ODE solver.

in noise + t + age → out velocity

Step 3 · Reconstruct

3D Haar IDWT

The generated wavelet coefficients are mapped back to a full-resolution volume by the inverse transform, no learned decoder, no compression artifacts.

in 8 subbands → out synthetic MRI

3D Haar DWT / IDWT Rectified Flow Matching CFM · VP · Trigonometric FiLM + cross-attention deterministic ODE few-step sampling

Axial, coronal and sagittal views of a single synthetic brain generated by FlowLet as the age condition is swept from 6 to 95 years with a fixed noise seed. — **Age conditioning, one seed.** Holding the initial noise fixed and varying only the age condition from 6 to 95 years produces coherent, age-dependent morphological change (axial / coronal / sagittal), the effect of combining FiLM with spatial cross-attention.

Results

Fast, controllable, and anatomically faithful

FlowLet is competitive on global metrics and stronger where it matters anatomically, region-level fidelity and downstream clinical utility.

Line plot of FID versus number of sampling steps for the RFM, CFM, VP and Trigonometric flow variants. — FID as a function of sampling steps for the four flow formulations.

Fast Sampling

deterministic ODE · no latent compression

0.420

ROI Dice ↑

mean over 95 brain regions

0.298

FID ↓

competitive global fidelity

4.01

Brain-age MAE ↓

underrepresented ages

Axial, coronal and sagittal slices comparing a real scan, FlowLet (Ours), and seven other 3D brain MRI synthesis methods. — **Qualitative comparison.** Real reference vs. FlowLet (*Ours*) and other methods, shown in three standard planes.

FastSurfer ROI parcellations (colored region maps) for a real sample, FlowLet variants and other methods across three planes. — **Region-based anatomical fidelity.** Automated parcellation into 95 ROIs for the real reference, FlowLet variants, and other methods, region-level structure that global metrics alone can miss.

Region-level fidelity

FlowLet reaches a mean ROI Dice of 0.420 across all 95 cortical and subcortical structures, preserving anatomy that global metrics can overlook.

Competitive global fidelity

Strong FID, MMD and MS-SSIM scores while sampling deterministically in only a few steps.

Better clinical utility

Augmenting training with FlowLet samples lowers brain-age prediction error, with a 4.01-year MAE on underrepresented ages.

Perspective

Reading the metrics, not just reporting them

In volumetric brain MRI a single global score can quietly mislead. Most voxels are background or non-informative, so distribution-level metrics such as FID and MMD can look favorable even when clinically relevant anatomy is wrong, and a generator can be rewarded simply for drifting toward an "average" brain.

This is why we treat intra-set MS-SSIM not as a quality target but as a relative diversity signal: interpreted under consistent conditions and read alongside global fidelity and region-level anatomy, it exposes the mode collapse that aggregate numbers hide. Only when these measures are read together do they give an honest, anatomy- and task-aware picture of generative quality, which is exactly why FlowLet is evaluated across 95 regions and a downstream clinical task, not a single headline number.

How we frame and interpret each metric, and where they break down, in the paper →

Code & data

Open-source & reproducible

The complete PyTorch implementation, training/generation scripts, and evaluation protocols are released openly.

📦 Official implementation

The reference release of FlowLet: training, generation, and the dataset catalog.

github.com/sisinflab/FlowLet

⎇ Development & enhancements

Ongoing development, experiments, and future enhancements.

github.com/Danesed/FlowLet

🤗 Pretrained models

Rectified Flow Matching checkpoints (two resolutions, in a base and a large U-Net configuration). Currently in training and added to the Hugging Face repository as they become available.

huggingface.co/danesed/FlowLet

Datasets

Built on OpenBHB, ADNI and OASIS-3 — 5,794 cognitively-normal T1w scans across 12+ sites.

Evaluation

Global metrics (FID, MMD, MS-SSIM), region-based ROI analysis over 95 structures, and a downstream brain-age prediction study.

License

Released under the MIT License for research and reuse.

Citation

BibTeX

If you find FlowLet useful, please cite the paper.

FlowLet — Medical Image Analysis (version of record)

@article{DANESE2026104161,
  title   = {FlowLet: Conditional 3D brain MRI synthesis using wavelet flow matching},
  author  = {Danese, Danilo and Lombardi, Angela and Attimonelli, Matteo and Fasano, Giuseppe and Di Noia, Tommaso},
  journal = {Medical Image Analysis},
  pages   = {104161},
  year    = {2026},
  issn    = {1361-8415},
  doi     = {10.1016/j.media.2026.104161},
  url     = {https://www.sciencedirect.com/science/article/pii/S1361841526002306},
  publisher = {Elsevier}
}

% arXiv preprint (earlier version, may differ from the published paper)
@misc{danese2026flowletconditional3dbrain,
      title={FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching},
      author={Danilo Danese and Angela Lombardi and Matteo Attimonelli and Giuseppe Fasano and Tommaso Di Noia},
      year={2026},
      eprint={2601.05212},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.05212},
}