User Documentation

Introduction

LFCNN is a Python framework, based on TensorFlow and the Keras API, for versatile light-field deep learning applications. Besides its name, it is suitable not only for Convolutional Neural Networks (CNN) but all architectures supported by TensorFlow, however CNNs are usually the most common architectures for light field-related purposes.

The framework is design to be easy to use, getting your project started quickly with as minimal overhead as possible, while still offering customizability and unique workflows.

We have developed LFCNN to be as versatile as possbile, supporting multi input, multi output models, however we have likely missed or forgotten about specific light field usecases. If you find LFCNN to not adapt well to your needs, we do welcome contributions and extensions (see below)!

How To Use

Quickstart

As a quickstart, have a look at the provided examples. Here, you’ll find how to simply train and test models. If you want to create a new model architecture, all you have to do is

  • create a new model module containing your model class

  • define the data generator and reshape

  • implement the architecture using the create_model function.

For a comparativly easy example, see the definition of the EPINET model and the corresponding example training and evaluating the model.

These two files (less than 200 lines of code) are basically everything you need to implement when developing and training/testing a new architecture :)

Note that in LFCNN, light fields are always of shape (u, v, s, t, ch).

Details

At its core, LFCNN provides seven packages, two of which are essential when you want to implement your own architectures: the generators package and the models packag with the base class definitions of BaseModel and BaseGenerator.

The generator base class handles all data input and augmentations specifically designed to be used with light fields and common data labels such as disparity. On the other hand, the model base class is basically a wrapper around Keras’ tf.keras.Model class tightly integrating with the data generators. We have chosen this approach because in light field-related models, the input shape of the light field into the model can vary significantly and there is no default or “best” way to do it. Natively, one would feed the full light, the shape of which in LFCNN are always (u, v, s, t, lambda), into the network and for example perform a native 4D convolution on it. However, since there is no native 4D convolution in CUDA and 4D convolution is computationally expensive, this is usally not the way to do it. Instead, one uses either multiple streams extracted from the light field, for example the commonly used cross-hair EPI volumes, or a reshape, for example reshaping (u, v, s, t, ch) to (s, t, u*v*ch) resulting in a stack of subaperture images. For this reason, the reshape, and hence the data generators, is tightly intertwined with the model definition. However, let’s first have a close look at the data generators.

Generators

The data generators generate a batch of light fields (possible in a multi stream and/or reshaped fashion) and corresponding labels and takes care of all data reading, augmentation and multiprocessing. You don’t really ever need to instantiate a generator instance when training or evaluating a model as this is done automatically for you. However, understanding how the generators work may be necessary when the ones we provide do not fit your need. Possble labels are the light field itself (used for autoencoders), the disparity of all or a single subaperture view, a superresolved light field, etc. Out-of-the-box we provide the following generators, which can be combined with arbitrary reshapes (see below):

  • LfGenerator : Generates light field batches and light field labels, e.g. for autoencoders.

  • DisparityGenerator : Generates light field batches and central view disparity labels, e.g. used for disparity estimators.

  • LfDownSampleGenerator : Generates downsampled light field batches and original (thus superresolved) light field labels, e.g. used for light field superresolution. Downsampling is available in both angular and/or the spatial domain. Note, however, that no anti-aliasing is performed.

If you have an application that is not covered by these, you can simply specify your own data generator. Most of the work for you is already done in the BaseGenerator, you basically only have to implement the process_data function. For example, see the already implemented generators mentioned above.

The Generators are then combined with a model-specific reshape. We provide the following reshapes, however it is straightforward to implement your own:

  • lf_identity() : Doesnt perform any reshape upon data generation.

  • lf_subaperture_stream() : Provides a stream of u*v subapertures [(s, t, ch), (s, t, ch), …, (s, t, ch)] for multi-input subaperture-based models.

  • lf_subaperture_stack() : Stacks the subapertures in the channel axis. Resulting shape: (s, t, u*v*ch) Can easily be used with conventional 2D convolution.

  • lf_subaperture_channel_stack() : Stacks the subapertures, but keeping the channel axis. Resulting shape: (s, t, u*v,ch). Can be used with 3D convolution.

  • lf_crosshair() : Four-stream light field crosshair: vertical, horizontal and two diagonal EPI volumes. Results in [(v, s, t, ch), (u, s, t, ch), (sqrt(u**2 + v**2), s, t, ch), (sqrt(u**2 + v**2), s, t, ch)]

  • lf_crosshair_stacked() : Similar to the crosshair reshape, but stacking the resulting subapertures in the channel axis. Results in [(s, t, v*ch), (s, t, u*ch), (s, t, sqrt(u**2 + v**2)*ch), (s, t, sqrt(u**2 + v**2)*ch)] For example used by the EPINET disparity estimator.

  • lf_distributed() : Interprets the light field as a time sequence of subaperture views. Resulting shape: (u*v, s, t, lambda). Can for example be used with the tensorflow.keras.layers.TimeDistributed layer wrapper to achieve pseudo separable 4D convolution, for example used by the SASCONV superresolution model.

Models

The LFCNN models are wrappers around Keras models, integrating the corresponding generators, reshapes and also metrics and losses, and providing very easy interfaces for training, testing and evaluation. All models — which are devided into the subpackges autoencoder, disparity, and disparity — for a better structure (you can add new subpackages of course), are derived from the abstract BaseModel which implements all necessary functionality and for which you don’t really have to worry about (unless you’re interested).

It is recommended to use the implemented methods to train, test and evaluate your models, however if that does not fit your workflow or usecase, you can do anything you like by accessing the keras_model attibute of an instantiated LFCNN Model which holds a Keras Model instance.

Model Creation

To create a new model, it is easiest to have a look at one of the models that we provide, e.g. the EPINET model. Basically, you need to specify two things:

  • The set_generator_and_reshape method that, as the name suggests, sets the generator class and reshape function that the model is designed for. For example, a disparity estimator model working with a cross-hair multi-stream input of EPI volumes uses the DisparityGenerator Generator and lf_crosshair_stacked() reshape.

  • The model architecture, by implementing create_model. Here you can do everything allowed within TensorFlow and Keras that in the end returns a Keras Model instance. Most straightforwardly, you can simply stick to the functional Model API by Keras.

Training

For the training, the following has to be specified: A training optimizer, loss and metrics (see below), possibly callbacks, and of course data.

In LFCNN, data can be provided in two ways: either by specifying the system path to an HDF5 file, containing the light field and label patches used during training, or by first loading the data into the RAM and creating a data dictionary containing the data, e.g. with a dataset of size 128 and light fields of shape (9, 9, 36, 36, 3)

# load data, here dummy data generation
light_field = numpy.random.rand(128, 9, 9, 36, 36, 3)
disparity = numpy.random.rand(128, 36, 36, 1)
data = dict(light_field=light_field, disparity=disparity)

The training itself is then performed using the train() method. Have a look at the function documentation for all parameters that need to be set for training. A basic training example can be found in the example folder.

Test and Evaluation

Testing should be performed using the test() method which is mostly analogous to the train method in its usage. For more in-depth evaluation, we suggest to also evaluate a network using so-called challenges, i.e. full-sized light fields with ground truth labels (as for example provided with our datasets). Unlike training, where light fields are usually patched into smaller shapes such as (9, 9, 32, 32, 3), the evaluate_challenges() method is meant to be used with full-sized light fields. In the background, the model will be recompiled for the new input shapes that deviate from the training shapes. Unlike the test() method which provides only mean metric evaluations, the challenge evaluation returns the predicitions for all provided challenges and the metric scores for every prediction. The predictions and corresponding metric values can than be used to judge the network performance with respect to a specific challenge and to easily include predictions and metric scores in a publication or presentation, resulting in minimal boilerplate code.

Losses and Metrics

In principle, the Model instantiation works with all Keras Loss and Metric instances. However, we provide a set of re-implementations and some additional losses specific to light field and multispectral applications. These losses are defined in the lfcnn.losses.losses module. All losses defined here are averaged over the mini batches during training and can hence be easily combined. Some combined losses are provided in the lfcnn.losses.combined_losses module.

Callbacks

Instantiating a new LFCNN model, callbacks can be specified. These callbacks are basically just passed down to Keras. We provide several callbacks (light field-unrelated) to define some commonly used learning rate schedulers as well the cyclic learning approach and the learning rate finder proposed by L. N. Smith.

Furthermore, integration with Sacred is obtained via callbacks, however please consult the corresponding Sacred entry in this document for details.

Layers

The layers package is meant to hold light field-specific layers. As of now, there are a couple of residual layers and reshape layers defined that we have found to be commonly used in light field-related applications. However, contributions by the community are very welcome! For example by providing different (pseudo) 4D convolution layers.

Utils

The utils package holds a collection of utilities. Most notably, the lfcnn.utils.tf_utils module providing access to the mixed precision API and some TensorFlow commands (which we have found hard to remember, so we packed them all here).

The lfcnn.utils.callback_utils module provide a Matplotlib-based visualization of the learning rate schedulers.

Sacred

Sacred is a Python framework to log experiment configurations and results, for exmaple to a MongoDB or MySQL database. We provide Callbacks, to easily integrate LFCNN with Sacred.

As a quickstart, have a look at the provided Sacred examples.

To use Sacred with LFCNN, we provide several Callbacks, that log losses, metrics, and training status to a Sacred observer. These callbacks are defined in the lfcnn.callbacks.sacred module. Each of the defined Callbacks takes a Sacred run object upon instantiation. To further simplify the use of Sacred with a MongoDB observer, our examples use the mdbh tools.

Contributing

We welcome contribution by the community! We have tried to make LFCNN as versatile as possible, however there are likely some issues for usecases that we did not have in mind. Or, you find a bug or other flaw in the source code. Either way, let us know by opening an Issue or even creating a Merge Request in the GitLab repository. We would love to see LFCNN grow, and become more mature and widespread.