โ— Accepted at Medical Image Analysis (Elsevier)

FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching

Danilo Danese1,*, Angela Lombardi1,*, Matteo Attimonelli1,2, Giuseppe Fasano1, Tommaso Di Noia1

1 Politecnico di Bari, Italy  ยท  2 Sapienza University of Rome, Italy

* Corresponding authors

The Medical Image Analysis article is the peer-reviewed version of record. The arXiv preprint is an earlier version and may differ from the published paper.

FlowLet architecture: a real 3D MRI is transformed by a 3D Haar DWT into eight wavelet subbands; a velocity-predicting U-Net with ResBlock+FiLM and spatial conditioning is trained with an MSE flow-matching loss; at inference an ODE integration loop followed by the inverse DWT produces a synthetic sample.
FlowLet pipeline. A 3D Haar transform maps the volume to wavelet space, a conditional velocity U-Net is trained with a flow-matching objective, and the inverse transform reconstructs the synthetic MRI. Age enters via FiLM and spatial cross-attention.

Highlights

What FlowLet does

A single-stage generative framework that brings Flow Matching into a fixed, invertible wavelet domain: fast to sample, controllable, and anatomically faithful.

1

Wavelet flow matching

Generative modeling directly in the 3D Haar wavelet domain โ€” multi-scale, with no learned latent compression.

2

Age conditioning

Complementary FiLM modulation and spatially adaptive cross-attention give explicit control over localized, age-related morphology.

3

Fast deterministic sampling

Deterministic ODE sampling generates high-quality volumes in just a few steps.

4

Region- & task-aware eval

Evaluation across 95 cortical/subcortical regions and a downstream brain-age prediction study, beyond global metrics.

5

Open & reproducible

Methodology, code, and evaluation protocols are released free and open-source.

+

Accessible to train

A ~1B-parameter 3D U-Net that trains within 24 GB of VRAM.

Abstract

Generative modeling for 3D brain MRI is challenged by a trade-off between anatomical fidelity, sample diversity, and computational efficiency. Diffusion-based approaches achieve strong visual quality but typically require hundreds to thousands of sampling steps, while latent-space compression can introduce reconstruction artifacts and degrade fine-grained anatomy. We introduce FlowLet, a conditional generative framework that performs Flow Matching in an invertible 3D wavelet domain. This representation enables multi-scale generation without learned latent compression, while deterministic ODE sampling allows fast inference. Age conditioning is modeled through complementary feature-wise modulation and spatially adaptive cross-attention, enabling explicit control over age-related morphological variation. Across multi-site neuroimaging datasets, FlowLet achieves competitive and, in several settings, superior global fidelity compared to diffusion-based baselines using as few as 10 sampling steps. Region-based evaluation across 95 cortical and subcortical brain regions demonstrates improved local anatomical plausibility beyond what is captured by global similarity metrics alone. In a downstream brain age prediction study, models augmented with FlowLet-generated data consistently reduce prediction error relative to real-only training and other generative baselines. The proposed framework is released as open-source to support reproducibility.

The challenge

Why 3D brain MRI synthesis is hard

Brain-age prediction needs large, diverse, age-balanced cohorts โ€” yet public 3D MRI datasets are demographically skewed, and existing generators force a hard trade-off.

The generative trilemma

Sample quality, diversity, and sampling efficiency pull against each other, improving one usually degrades another.

Fidelity vs. compression

Latent compression speeds things up but can blur fine-grained anatomy that age-related analysis depends on.

Age imbalance

Young and middle-aged adults dominate; pediatric and elderly groups are under-sampled, biasing downstream models.

Overlaid age histograms for the OpenBHB, OASIS-3 and ADNI datasets showing a strong young-adult peak from OpenBHB and older-adult coverage from OASIS-3 and ADNI.
Age distribution across the integrated cohort. OpenBHB concentrates younger adults, while OASIS-3 and ADNI enrich the 60โ€“95 range, together spanning the lifespan but remaining imbalanced.

The method

Flow matching in an invertible wavelet domain

A single-stage pipeline: decompose the volume with an invertible 3D Haar transform, learn a velocity field that transports Gaussian noise to data in wavelet space, then reconstruct with the inverse transform.

Step 1 ยท Decompose

3D Haar DWT

Each volume is split into one low-frequency subband (coarse anatomy) and seven high-frequency subbands (fine detail), lossless and learning-free.

in real 3D MRI  โ†’  out 8 subbands
Step 2 ยท Transport

Velocity U-Net vฮธ

A conditional 3D U-Net predicts the flow-matching velocity field. Age is injected via FiLM and spatial cross-attention. Sampled by a deterministic ODE solver.

in noise + t + age  โ†’  out velocity
Step 3 ยท Reconstruct

3D Haar IDWT

The generated wavelet coefficients are mapped back to a full-resolution volume by the inverse transform, no learned decoder, no compression artifacts.

in 8 subbands  โ†’  out synthetic MRI
3D Haar DWT / IDWT Rectified Flow Matching CFM ยท VP ยท Trigonometric FiLM + cross-attention deterministic ODE few-step sampling
Axial, coronal and sagittal views of a single synthetic brain generated by FlowLet as the age condition is swept from 6 to 95 years with a fixed noise seed.
Age conditioning, one seed. Holding the initial noise fixed and varying only the age condition from 6 to 95 years produces coherent, age-dependent morphological change (axial / coronal / sagittal), the effect of combining FiLM with spatial cross-attention.

Results

Fast, controllable, and anatomically faithful

FlowLet is competitive on global metrics and stronger where it matters anatomically, region-level fidelity and downstream clinical utility.

Line plot of FID versus number of sampling steps for the RFM, CFM, VP and Trigonometric flow variants.
FID as a function of sampling steps for the four flow formulations.
Fast Sampling
deterministic ODE ยท no latent compression
0.420
ROI Dice โ†‘
mean over 95 brain regions
0.298
FID โ†“
competitive global fidelity
4.01
Brain-age MAE โ†“
underrepresented ages
Axial, coronal and sagittal slices comparing a real scan, FlowLet (Ours), and seven other 3D brain MRI synthesis methods.
Qualitative comparison. Real reference vs. FlowLet (Ours) and other methods, shown in three standard planes.
FastSurfer ROI parcellations (colored region maps) for a real sample, FlowLet variants and other methods across three planes.
Region-based anatomical fidelity. Automated parcellation into 95 ROIs for the real reference, FlowLet variants, and other methods, region-level structure that global metrics alone can miss.

Region-level fidelity

FlowLet reaches a mean ROI Dice of 0.420 across all 95 cortical and subcortical structures, preserving anatomy that global metrics can overlook.

Competitive global fidelity

Strong FID, MMD and MS-SSIM scores while sampling deterministically in only a few steps.

Better clinical utility

Augmenting training with FlowLet samples lowers brain-age prediction error, with a 4.01-year MAE on underrepresented ages.

Perspective

Reading the metrics, not just reporting them

In volumetric brain MRI a single global score can quietly mislead. Most voxels are background or non-informative, so distribution-level metrics such as FID and MMD can look favorable even when clinically relevant anatomy is wrong, and a generator can be rewarded simply for drifting toward an "average" brain.

This is why we treat intra-set MS-SSIM not as a quality target but as a relative diversity signal: interpreted under consistent conditions and read alongside global fidelity and region-level anatomy, it exposes the mode collapse that aggregate numbers hide. Only when these measures are read together do they give an honest, anatomy- and task-aware picture of generative quality, which is exactly why FlowLet is evaluated across 95 regions and a downstream clinical task, not a single headline number.

How we frame and interpret each metric, and where they break down, in the paper โ†’

Code & data

Open-source & reproducible

The complete PyTorch implementation, training/generation scripts, and evaluation protocols are released openly.

๐Ÿ“ฆ Official implementation

The reference release of FlowLet: training, generation, and the dataset catalog.

github.com/sisinflab/FlowLet

โއ Development & enhancements

Ongoing development, experiments, and future enhancements.

github.com/Danesed/FlowLet

Datasets

Built on OpenBHB, ADNI and OASIS-3 โ€” 5,794 cognitively-normal T1w scans across 12+ sites.

Evaluation

Global metrics (FID, MMD, MS-SSIM), region-based ROI analysis over 95 structures, and a downstream brain-age prediction study.

License

Released under the MIT License for research and reuse.

Citation

BibTeX

If you find FlowLet useful, please cite the paper.

FlowLet โ€” arXiv preprint
@article{danese2026flowlet,
  title={FlowLet: Conditional 3D Brain MRI Synthesis using Wavelet Flow Matching},
  author={Danese, Danilo and Lombardi, Angela and Attimonelli, Matteo and Fasano, Giuseppe and Di Noia, Tommaso},
  journal={arXiv preprint arXiv:2601.05212},
  year={2026}
}