GSoC Blog

Normalizing Flows for Fast Detector Simulation

Mentors: Prof. Sergei V. Gleyzer, Prof. Harrison B. Prosper & Prof. Michelle Kuchera.
Repository: Normalizing Flows for Fast Detector Simulation
Project Group: DeepFalcon
Organization: ML4SCI

The focus of my project was using Normalizing Flows to simulate detector response of particle jet deposits.

Normalizing Flows [1] are a method for constructing complex distributions by transforming a simpler probability distribution through a series of invertible mappings. Normalizing Flows have been used extensively in generative modelling as they allow a way of mapping a simple parametric distributions to more complicated distributions that better approximate real-world data.

This project aimed at using the generative capabilities of recent Normalizing Flow models like Masked Autoregressive Flows [2] and Block Neural Autoregressive Flows [3] to model conplex jet distributions encountered when simulating detector deposits. This project thus extended the previous work done by Harari et al. [4] wherein they used Graph Autoencoders to simulate Falcon deposits.

Work Summary

Two main algorithms were implemented as part of this Project, the first half of the project focused on implementing Block Neural Autoregressive Flows (BNAF), while Masked Autoregressive Flows (MAF) was implemented during the later half of the coding period. While these both are Autoregressive Flow models that use Deep Networks there are certain important differences in how these models can be practically used for detector simulation.

While in principle the inverse of a BNAF model exists (see section 3.4 of [3] for details), such a model is not available for use in a closed form. Practically this means that while it is possible to go from a complex data distribution (of jet instances in our case) to a simple distribution (say a gaussian), it is not possible to map back the new sampled points onto the complex distribution. This thus prevents us from Generating new jet instances, and limits us to applications like density estimation.

For MAF, however, we have access to the inverse flow and it is thus possible to sample points from the simple distribution, and invert them into points from the complex distribution thus obtaining new samples that belong to the input distribution. It should also be noted that MAF is significantly harder to train and optimise - as can be seen from the respective training plots in the Results section.

	>>> tree -I 'model*|temp__*|wandb*|run_'
	├── cache
	│   └── X_dict.pkl
	├── data
	│   ├── Boosted_Jets_Sample-0.snappy.parquet
	│   ├── Boosted_Jets_Sample-1.snappy.parquet
	│   ├── Boosted_Jets_Sample-2.snappy.parquet
	│   ├── Boosted_Jets_Sample-3.snappy.parquet
	│   └── Boosted_Jets_Sample-4.snappy.parquet
	├── logs
	├── nbs
	│   ├── nb3_OLD.ipynb
	│   ├── Starter.ipynb
	│   ├── Week_1.ipynb
	│   ├── Week_2.ipynb
	│   ├── Week_3.ipynb
	│   ├── Week_4.ipynb
	│   ├── Week_5.ipynb
	│   └── Week_6.ipynb
	├── requirements.txt
	└── results

Implementation Details

The main implementation consisted of the following subtasks -
  1. Implementing the Dataset Class that supports filtering raw jet instances based on their energies.
  2. Implementing Deep Normalizing Flow models -
    • Block Neural Autoregressive Flow (BNAF)
    • Masked Autoregressive Flows (MAF)
  3. Designing training scripts to train and evaluate the models.
  4. Adding logging functionalty to log results and checkpoint models.

1. The Dataset Class

The dataset class implements three methods - get_next_instance(), get_raw_instance() and pad_instance() to generate a 'valid' sample from the raw Parquet dataset. A sample can be considered as 'valid' if it has a certain numeber of jets or if it's energy lies in a certian interval. To avoid preprocessing the dataset to filter out such valid instances, we implement get_next_instance() to automatically scan through raw jet instances until it finds an instance that matches the specified criteria. Thus, the implementation resolves instances into 'valid' and 'not valid' categories dynamically during training.

It should also be noted that we need to pad each instance to a specific dimension before we can feed it into our model, as every instance in our training batch must be of the same dimension (so that we can effectively parallelize tensor computation on a GPU). This is handled by the pad_instance() function.

2. Implementing the Flow Models

To implement BNAF we first define the components that constitute it's forward pass. These include the MaskedLinear layer, the Tanh activation function, and the FlowSequential container. Using these components we implement functions for a forward pass through BNAF (forward()) and to sample from it's base distribution (base_dist()). It should be noted that we do not implement the inverse() function for BNAF as its inverse is not available in a closed form.

Similarly for MAF, we start by implementing the individual components - the MaskedLinear, LinearMaskedCoupling & BatchNorm layers, as well as the FlowSequential container. We then implement the MADE module which are then stacked together with BatchNorm to finally get the MAF network.

        # MADE Network
        self.net_input = MaskedLinear(input_size, hidden_size, masks[0], cond_label_size) = []
        for m in masks[1:-1]:
   += [activation_fn, MaskedLinear(hidden_size, hidden_size, m)] += [activation_fn, MaskedLinear(hidden_size, 2 * input_size, masks[-1].repeat(2,1))] = nn.Sequential(*

        # MAF Network
        modules = []
        self.input_degrees = None
        for i in range(n_blocks):
            modules += [MADE(input_size, hidden_size, n_hidden, cond_label_size, activation, input_order, self.input_degrees)]
            self.input_degrees = modules[-1].input_degrees.flip(0)
            modules += batch_norm * [BatchNorm(input_size)] = FlowSequential(*modules)

3. Logging and Checkpointing

We used Weights and Biases to log training and evaluation metrics for different training runs. Each run logs the training loss at every iteration along with other user defined metrics like the current learning rate (useful when using learning rate schedulers).

In addition to the above, the following files are also stored for each run:
  1. output.log - Captures print statements that are output to the console.
  2. requirements.txt - Logs various packages needed to run the current code.
  3. wandb-metadata.json - Logs training metadata captured by Weights and Biases (sample below).
  4. wandb-summary.json - Summarises training metrics for the run (sample below).
  5. config.yaml - Captures Weights and Biases specific metadata.

An example of wandb-metadata.json for a training run -

    "os": "Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic",
    "python": "3.7.11",
    "heartbeatAt": "2021-08-20T11:20:12.190047",
    "startedAt": "2021-08-20T11:19:55.913546",
    "docker": null,
    "gpu": "Tesla K80",
    "gpu_count": 1,
    "cpu_count": 2,
    "cuda": "11.0.228",
    "args": [
    "state": "running",
    "program": "Week_6.ipynb",
    "colab": "",
    "git": {
        "remote": "",
        "commit": "3b4d2a4d8d89280c9f8b2887907fef8332226c0c"
    "email": null,
    "root": "/content/drive/My Drive/_GSoC/Normalizing-Flows",
    "host": "b5f464e77d26",
    "username": "root",
    "executable": "/usr/bin/python3"

An example of wandb-summary.json for a training run -

  "Step": 0,
  "Loss": 4298.6650390625,
  "Learning_Rate": 0.0001,
  "LogProb_Mean": -6425.44873046875,
  "LogProb_Std": 570.9646606445312,
  "_runtime": 6605,
  "_timestamp": 1629465010,
  "_step": 501,
  "_wandb": {
    "runtime": 8418

Apart from Weights and Biases, we also cache the trained model (and the optimser state) at regular intervals. This thus supports training previously trained models from their last checkpointed state.


It can be seen that training with MAF is initially unstable (as seen by the --- [ Encountered NaN ] line below). However, it can also be seen that the loss does decrese as the training progresses. Going from aroung 7200 to around 6500 after just a few interations.

	>>> train_evaluate(model, optimizer, train_dataset, test_dataset, args, toGenerate=False)
	--- [ Encountered NaN ]
	[Iter:    0/5000] Loss: 7335.718  |  LogP(x): 491.520 +/- nan
	[Iter:    1/5000] Loss: 7206.878  |  LogP(x): -20063.887 +/- 1499.554
	[Iter:    2/5000] Loss: 7201.562  |  LogP(x): -14998.738 +/- 3175.619
	[Iter:    3/5000] Loss: 6909.108  |  LogP(x): -12103.184 +/- 1325.232
	[Iter:    4/5000] Loss: 7181.210  |  LogP(x): -10597.162 +/- 545.477
	[Iter:    5/5000] Loss: 7169.951  |  LogP(x): -9496.451 +/- 375.776
	[Iter:    6/5000] Loss: 6763.291  |  LogP(x): -9188.175 +/- 477.719
	[Iter:    7/5000] Loss: 6955.591  |  LogP(x): -8744.181 +/- 386.556
	[Iter:    8/5000] Loss: 7095.173  |  LogP(x): -8299.588 +/- 232.408
	[Iter:    9/5000] Loss: 6568.482  |  LogP(x): -8039.018 +/- 170.622
	[Iter:   41/5000] Loss: 5827.071  |  LogP(x): -6942.954 +/- 221.323
	[Iter:   42/5000] Loss: 6559.073  |  LogP(x): -6858.913 +/- 159.397
	[Iter:   43/5000] Loss: 6696.210  |  LogP(x): -6958.787 +/- 187.290


1. Training Curves for BNAF

We can observe that atleast a few models (or a few hyperparameter choices) converge well for BNAF.

2. Training Curves for MAF

It is comparatively harder to train MAF. While it can be seen that the loss decreases regularly, the training is noisy and its difficult to judge how well the model has converged.

3. Plotting Log Probability for real and artificial Jets using BNAF.

It can be seen that real jets (shown with blue points) consistently have a higher Log Probability than artificial jets (shown with red points) generated from random data.

4. New jet instances generated through MAF

As the inverse flow for MAF is available in a closed form, we are able to generate new jet instances. A few generated instances are presented below.
Sample 1 Sample 2
Sample 3 Sample 4

Sample 5 Sample 6
Sample 7 Sample 8
Sample 9 Sample 10
Sample 11 Sample 12

Further Work

This project can be extended is a lot of different directions. A few of them are -
  1. Further optimising MAF to yield more realistic looking jet deposits.
  2. Combining Normalizing Flows with Graph Neural Networks, possibly using algorithms like Graph Normalizing Flows. This is potentially interesting due to recent work in simulating jet events using other Graph Generative models (for instance, this work by Hariri et al.) It can be argued that graphical models provide better inductive biases when modelling hits in a jet.
  3. It will be interesting to see if one can genrate class conditional jet instances using Deep Flow models.


  1. Danilo Rezende and Shakir Mohamed. Variational inference with normalizing flows. In International Conference on Machine Learning, volume 37, pages 1530–1538, 2015.
  2. George Papamakarios, Theo Pavlakou, & Iain Murray. (2018). Masked Autoregressive Flow for Density Estimation.
  3. Nicola De Cao, Ivan Titov, & Wilker Aziz. (2019). Block Neural Autoregressive Flow.
  4. Ali Hariri, Darya Dyachkova, & Sergei Gleyzer. (2021). Graph Generative Models for Fast Detector Simulations in High Energy Physics.