|
||||||||||||||||||||||||
![]() | ||||||||||||||||||||||||
Mentors: Prof. Sergei V. Gleyzer,
Prof. Harrison B. Prosper &
Prof. Michelle Kuchera.
Repository: Normalizing Flows for Fast Detector Simulation Project Group: DeepFalcon Organization: ML4SCI The focus of my project was using Normalizing Flows to simulate detector response of particle jet deposits. Normalizing Flows [1] are a method for constructing complex distributions by transforming a simpler probability distribution through a series of invertible mappings. Normalizing Flows have been used extensively in generative modelling as they allow a way of mapping a simple parametric distributions to more complicated distributions that better approximate real-world data. This project aimed at using the generative capabilities of recent Normalizing Flow models like Masked Autoregressive Flows [2] and Block Neural Autoregressive Flows [3] to model conplex jet distributions encountered when simulating detector deposits. This project thus extended the previous work done by Harari et al. [4] wherein they used Graph Autoencoders to simulate Falcon deposits. Work SummaryTwo main algorithms were implemented as part of this Project, the first half of the project focused on implementing Block Neural Autoregressive Flows (BNAF), while Masked Autoregressive Flows (MAF) was implemented during the later half of the coding period. While these both are Autoregressive Flow models that use Deep Networks there are certain important differences in how these models can be practically used for detector simulation.While in principle the inverse of a BNAF model exists (see section 3.4 of [3] for details), such a model is not available for use in a closed form. Practically this means that while it is possible to go from a complex data distribution (of jet instances in our case) to a simple distribution (say a gaussian), it is not possible to map back the new sampled points onto the complex distribution. This thus prevents us from Generating new jet instances, and limits us to applications like density estimation. For MAF, however, we have access to the inverse flow and it is thus possible to sample points from the simple distribution, and invert them into points from the complex distribution thus obtaining new samples that belong to the input distribution. It should also be noted that MAF is significantly harder to train and optimise - as can be seen from the respective training plots in the Results section. >>> tree -I 'model*|temp__*|wandb*|run_' ├── cache │ └── X_dict.pkl ├── data │ ├── Boosted_Jets_Sample-0.snappy.parquet │ ├── Boosted_Jets_Sample-1.snappy.parquet │ ├── Boosted_Jets_Sample-2.snappy.parquet │ ├── Boosted_Jets_Sample-3.snappy.parquet │ └── Boosted_Jets_Sample-4.snappy.parquet ├── get_dataset.sh ├── logs ├── nbs │ ├── nb3_OLD.ipynb │ ├── Starter.ipynb │ ├── Week_1.ipynb │ ├── Week_2.ipynb │ ├── Week_3.ipynb │ ├── Week_4.ipynb │ ├── Week_5.ipynb │ └── Week_6.ipynb ├── README.md ├── requirements.txt └── results Implementation DetailsThe main implementation consisted of the following subtasks -
1. The Dataset ClassThe dataset class implements three methods -get_next_instance() ,
get_raw_instance() and pad_instance() to generate a 'valid' sample
from the raw Parquet dataset. A sample can be considered as 'valid' if it has a certain
numeber of jets or if it's energy lies in a certian interval. To avoid preprocessing the
dataset to filter out such valid instances, we implement get_next_instance()
to automatically scan through raw jet instances until it finds an instance that matches
the specified criteria. Thus, the implementation resolves instances into 'valid' and
'not valid' categories dynamically during training.
It should also be noted that we need to pad each instance to a specific dimension before we can feed it into our model, as every instance in our training batch must be of the same dimension (so that we can effectively parallelize tensor computation on a GPU). This is handled by the pad_instance() function.
2. Implementing the Flow ModelsTo implement BNAF we first define the components that constitute it's forward pass. These include theMaskedLinear layer, the Tanh activation function, and the
FlowSequential container. Using these components we implement functions for a
forward pass through BNAF (forward() ) and to sample from it's base distribution
(base_dist() ). It should be noted that we do not implement the inverse()
function for BNAF as its inverse is not available in a closed form.
Similarly for MAF, we start by implementing the individual components - the MaskedLinear ,
LinearMaskedCoupling & BatchNorm layers, as well as the FlowSequential
container. We then implement the MADE module which are then stacked together with
BatchNorm to finally get the MAF network.
# MADE Network self.net_input = MaskedLinear(input_size, hidden_size, masks[0], cond_label_size) self.net = [] for m in masks[1:-1]: self.net += [activation_fn, MaskedLinear(hidden_size, hidden_size, m)] self.net += [activation_fn, MaskedLinear(hidden_size, 2 * input_size, masks[-1].repeat(2,1))] self.net = nn.Sequential(*self.net) # MAF Network modules = [] self.input_degrees = None for i in range(n_blocks): modules += [MADE(input_size, hidden_size, n_hidden, cond_label_size, activation, input_order, self.input_degrees)] self.input_degrees = modules[-1].input_degrees.flip(0) modules += batch_norm * [BatchNorm(input_size)] self.net = FlowSequential(*modules) 3. Logging and CheckpointingWe used Weights and Biases to log training and evaluation metrics for different training runs. Each run logs the training loss at every iteration along with other user defined metrics like the current learning rate (useful when using learning rate schedulers).In addition to the above, the following files are also stored for each run:
![]() An example of wandb-metadata.json for a training run -
{ "os": "Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic", "python": "3.7.11", "heartbeatAt": "2021-08-20T11:20:12.190047", "startedAt": "2021-08-20T11:19:55.913546", "docker": null, "gpu": "Tesla K80", "gpu_count": 1, "cpu_count": 2, "cuda": "11.0.228", "args": [ "-f", "/root/.local/share/jupyter/runtime/kernel-d9dd4034-86eb-4ae4-9754-e6281b8b5f0a.json" ], "state": "running", "program": "Week_6.ipynb", "colab": "https://colab.research.google.com/drive/1HiJ7sV119GYiTyPcK1ePAWn0d7Gh0Nxa", "git": { "remote": "https://github.com/adiah80/Normalizing-Flows.git", "commit": "3b4d2a4d8d89280c9f8b2887907fef8332226c0c" }, "email": null, "root": "/content/drive/My Drive/_GSoC/Normalizing-Flows", "host": "b5f464e77d26", "username": "root", "executable": "/usr/bin/python3" } An example of wandb-summary.json for a training run -
{ "Step": 0, "Loss": 4298.6650390625, "Learning_Rate": 0.0001, "LogProb_Mean": -6425.44873046875, "LogProb_Std": 570.9646606445312, "_runtime": 6605, "_timestamp": 1629465010, "_step": 501, "_wandb": { "runtime": 8418 } } Apart from Weights and Biases, we also cache the trained model (and the optimser state) at regular intervals. This thus supports training previously trained models from their last checkpointed state. TrainingIt can be seen that training with MAF is initially unstable (as seen by the--- [ Encountered NaN ]
line below). However, it can also be seen that the loss does decrese as the
training progresses. Going from aroung 7200 to around 6500 after just a few
interations.
>>> train_evaluate(model, optimizer, train_dataset, test_dataset, args, toGenerate=False) --- [ Encountered NaN ] [Iter: 0/5000] Loss: 7335.718 | LogP(x): 491.520 +/- nan [Iter: 1/5000] Loss: 7206.878 | LogP(x): -20063.887 +/- 1499.554 [Iter: 2/5000] Loss: 7201.562 | LogP(x): -14998.738 +/- 3175.619 [Iter: 3/5000] Loss: 6909.108 | LogP(x): -12103.184 +/- 1325.232 [Iter: 4/5000] Loss: 7181.210 | LogP(x): -10597.162 +/- 545.477 [Iter: 5/5000] Loss: 7169.951 | LogP(x): -9496.451 +/- 375.776 [Iter: 6/5000] Loss: 6763.291 | LogP(x): -9188.175 +/- 477.719 [Iter: 7/5000] Loss: 6955.591 | LogP(x): -8744.181 +/- 386.556 [Iter: 8/5000] Loss: 7095.173 | LogP(x): -8299.588 +/- 232.408 [Iter: 9/5000] Loss: 6568.482 | LogP(x): -8039.018 +/- 170.622 ... [Iter: 41/5000] Loss: 5827.071 | LogP(x): -6942.954 +/- 221.323 [Iter: 42/5000] Loss: 6559.073 | LogP(x): -6858.913 +/- 159.397 [Iter: 43/5000] Loss: 6696.210 | LogP(x): -6958.787 +/- 187.290 Results1. Training Curves for BNAFWe can observe that atleast a few models (or a few hyperparameter choices) converge well for BNAF.![]() 2. Training Curves for MAFIt is comparatively harder to train MAF. While it can be seen that the loss decreases regularly, the training is noisy and its difficult to judge how well the model has converged.![]() 3. Plotting Log Probability for real and artificial Jets using BNAF.It can be seen that real jets (shown with blue points) consistently have a higher Log Probability than artificial jets (shown with red points) generated from random data.![]() 4. New jet instances generated through MAFAs the inverse flow for MAF is available in a closed form, we are able to generate new jet instances. A few generated instances are presented below.
Further WorkThis project can be extended is a lot of different directions. A few of them are -
Citations
|