User Documentation
Introduction
LFCNN is a Python framework, based on TensorFlow and the Keras API, for versatile light-field deep learning applications. Besides its name, it is suitable not only for Convolutional Neural Networks (CNN) but all architectures supported by TensorFlow, however CNNs are usually the most common architectures for light field-related purposes.
The framework is design to be easy to use, getting your project started quickly with as minimal overhead as possible, while still offering customizability and unique workflows.
We have developed LFCNN to be as versatile as possbile, supporting multi input, multi output models, however we have likely missed or forgotten about specific light field usecases. If you find LFCNN to not adapt well to your needs, we do welcome contributions and extensions (see below)!
How To Use
Quickstart
As a quickstart, have a look at the provided examples. Here, you’ll find how to simply train and test models. If you want to create a new model architecture, all you have to do is
create a new model module containing your model class
define the data generator and reshape
implement the architecture using the
create_model
function.
For a comparativly easy example, see the definition of the EPINET model and the corresponding example training and evaluating the model.
These two files (less than 200 lines of code) are basically everything you need to implement when developing and training/testing a new architecture :)
Note that in LFCNN, light fields are always of shape (u, v, s, t, ch)
.
Details
At its core, LFCNN provides seven packages, two of which are essential when
you want to implement your own architectures:
the generators package and the
models packag with the base class definitions of
BaseModel
and
BaseGenerator
.
The generator base class handles all data input and augmentations specifically
designed to be used with light fields and common data labels such as disparity.
On the other hand, the model base class is basically a wrapper around
Keras’ tf.keras.Model
class tightly integrating with
the data generators. We have chosen this approach because in light field-related
models, the input shape of the light field into the model can vary significantly
and there is no default or “best” way to do it. Natively, one would
feed the full light, the shape of which in LFCNN are always
(u, v, s, t, lambda)
, into the network and for example perform a native
4D convolution on it. However, since there is no native 4D convolution
in CUDA and 4D convolution is computationally expensive, this is usally
not the way to do it. Instead, one uses either multiple streams extracted
from the light field, for example the commonly used cross-hair EPI volumes,
or a reshape, for example reshaping (u, v, s, t, ch)
to (s, t, u*v*ch)
resulting in a stack of subaperture images. For this reason, the reshape,
and hence the data generators, is tightly intertwined with the model definition.
However, let’s first have a close look at the data generators.
Generators
The data generators generate a batch of light fields (possible in a multi stream and/or reshaped fashion) and corresponding labels and takes care of all data reading, augmentation and multiprocessing. You don’t really ever need to instantiate a generator instance when training or evaluating a model as this is done automatically for you. However, understanding how the generators work may be necessary when the ones we provide do not fit your need. Possble labels are the light field itself (used for autoencoders), the disparity of all or a single subaperture view, a superresolved light field, etc. Out-of-the-box we provide the following generators, which can be combined with arbitrary reshapes (see below):
LfGenerator
: Generates light field batches and light field labels, e.g. for autoencoders.
DisparityGenerator
: Generates light field batches and central view disparity labels, e.g. used for disparity estimators.
LfDownSampleGenerator
: Generates downsampled light field batches and original (thus superresolved) light field labels, e.g. used for light field superresolution. Downsampling is available in both angular and/or the spatial domain. Note, however, that no anti-aliasing is performed.
If you have an application that is not covered by these, you can simply specify
your own data generator. Most of the work for you is already done in the
BaseGenerator
, you basically
only have to implement the process_data function. For example, see the
already implemented generators mentioned above.
The Generators are then combined with a model-specific reshape. We provide the following reshapes, however it is straightforward to implement your own:
lf_identity()
: Doesnt perform any reshape upon data generation.
lf_subaperture_stream()
: Provides a stream of u*v subapertures [(s, t, ch), (s, t, ch), …, (s, t, ch)] for multi-input subaperture-based models.
lf_subaperture_stack()
: Stacks the subapertures in the channel axis. Resulting shape: (s, t, u*v*ch) Can easily be used with conventional 2D convolution.
lf_subaperture_channel_stack()
: Stacks the subapertures, but keeping the channel axis. Resulting shape: (s, t, u*v,ch). Can be used with 3D convolution.
lf_crosshair()
: Four-stream light field crosshair: vertical, horizontal and two diagonal EPI volumes. Results in [(v, s, t, ch), (u, s, t, ch), (sqrt(u**2 + v**2), s, t, ch), (sqrt(u**2 + v**2), s, t, ch)]
lf_crosshair_stacked()
: Similar to the crosshair reshape, but stacking the resulting subapertures in the channel axis. Results in [(s, t, v*ch), (s, t, u*ch), (s, t, sqrt(u**2 + v**2)*ch), (s, t, sqrt(u**2 + v**2)*ch)] For example used by the EPINET disparity estimator.
lf_distributed()
: Interprets the light field as a time sequence of subaperture views. Resulting shape: (u*v, s, t, lambda). Can for example be used with thetensorflow.keras.layers.TimeDistributed
layer wrapper to achieve pseudo separable 4D convolution, for example used by the SASCONV superresolution model.
Models
The LFCNN models are wrappers around Keras models, integrating the
corresponding generators, reshapes and also metrics and losses, and
providing very easy interfaces for training, testing and evaluation.
All models — which are devided into the subpackges
autoencoder,
disparity, and
disparity —
for a better structure (you can add new subpackages of course),
are derived from the abstract BaseModel
which implements all necessary functionality and for which you don’t
really have to worry about (unless you’re interested).
It is recommended to use the implemented methods to train, test and evaluate
your models, however if that does not fit your workflow or usecase,
you can do anything you like by accessing the
keras_model
attibute of an instantiated LFCNN Model
which holds a Keras Model instance.
Model Creation
To create a new model, it is easiest to have a look at one of the models that we provide, e.g. the EPINET model. Basically, you need to specify two things:
The
set_generator_and_reshape
method that, as the name suggests, sets the generator class and reshape function that the model is designed for. For example, a disparity estimator model working with a cross-hair multi-stream input of EPI volumes uses theDisparityGenerator
Generator andlf_crosshair_stacked()
reshape.The model architecture, by implementing
create_model
. Here you can do everything allowed within TensorFlow and Keras that in the end returns a Keras Model instance. Most straightforwardly, you can simply stick to the functional Model API by Keras.
Training
For the training, the following has to be specified: A training optimizer, loss and metrics (see below), possibly callbacks, and of course data.
In LFCNN, data can be provided in two ways: either by specifying the system path to an HDF5 file, containing the light field and label patches used during training, or by first loading the data into the RAM and creating a data dictionary containing the data, e.g. with a dataset of size 128 and light fields of shape (9, 9, 36, 36, 3)
# load data, here dummy data generation
light_field = numpy.random.rand(128, 9, 9, 36, 36, 3)
disparity = numpy.random.rand(128, 36, 36, 1)
data = dict(light_field=light_field, disparity=disparity)
The training itself is then performed using the
train()
method.
Have a look at the function documentation for all parameters that need to be set
for training. A basic training example can be found in the example folder.
Test and Evaluation
Testing should be performed using the
test()
method which is
mostly analogous to the train method in its usage.
For more in-depth evaluation, we suggest to also evaluate a network
using so-called challenges, i.e. full-sized light fields with ground
truth labels (as for example provided with our datasets).
Unlike training, where light fields are usually patched into smaller shapes
such as (9, 9, 32, 32, 3), the
evaluate_challenges()
method is meant to be used with full-sized light fields.
In the background, the model will be recompiled for the new input shapes
that deviate from the training shapes. Unlike the
test()
method which provides only
mean metric evaluations,
the challenge evaluation returns the predicitions for all provided
challenges and the metric scores for every prediction. The predictions
and corresponding metric values can than be used to judge the
network performance with respect to a specific challenge and to easily
include predictions and metric scores in a publication or presentation,
resulting in minimal boilerplate code.
Losses and Metrics
In principle, the Model instantiation works with all Keras Loss
and
Metric
instances. However, we provide a set of re-implementations and
some additional losses specific to light field and multispectral applications.
These losses are defined in the lfcnn.losses.losses
module.
All losses defined here are averaged over the mini batches during training and
can hence be easily combined. Some combined losses are provided in the
lfcnn.losses.combined_losses
module.
Callbacks
Instantiating a new LFCNN model, callbacks can be specified. These callbacks are basically just passed down to Keras. We provide several callbacks (light field-unrelated) to define some commonly used learning rate schedulers as well the cyclic learning approach and the learning rate finder proposed by L. N. Smith.
Furthermore, integration with Sacred is obtained via callbacks, however please consult the corresponding Sacred entry in this document for details.
Layers
The layers package is meant to hold light field-specific layers. As of now, there are a couple of residual layers and reshape layers defined that we have found to be commonly used in light field-related applications. However, contributions by the community are very welcome! For example by providing different (pseudo) 4D convolution layers.
Utils
The utils package holds a collection of utilities.
Most notably, the lfcnn.utils.tf_utils
module providing access
to the mixed precision API and some TensorFlow commands (which we have found
hard to remember, so we packed them all here).
The lfcnn.utils.callback_utils
module provide a Matplotlib-based visualization
of the learning rate schedulers.
Sacred
Sacred is a Python framework to log experiment configurations and results, for exmaple to a MongoDB or MySQL database. We provide Callbacks, to easily integrate LFCNN with Sacred.
As a quickstart, have a look at the provided Sacred examples.
To use Sacred with LFCNN, we provide several Callbacks, that log losses,
metrics, and training status to a Sacred observer. These callbacks
are defined in the lfcnn.callbacks.sacred
module.
Each of the defined Callbacks takes a Sacred run
object upon instantiation.
To further simplify the use of Sacred with a MongoDB observer, our examples
use the mdbh tools.
Contributing
We welcome contribution by the community! We have tried to make LFCNN as versatile as possible, however there are likely some issues for usecases that we did not have in mind. Or, you find a bug or other flaw in the source code. Either way, let us know by opening an Issue or even creating a Merge Request in the GitLab repository. We would love to see LFCNN grow, and become more mature and widespread.