Tutorial
========

This is a short tutorial how to use the **EvoStencils** package. After
cloning the repository, we first have to install the required Python
packages.

.. code:: ipython3

    # Install evostencils package with requirements
    pip install -e .

To automatically generate a parallel implementation of each
multigrid-based solver discovered through evolutionary program
synthesis, EvoStencils makes use of the
`ExaStencils <https://www.exastencils.fau.de>`__ framework. Note that
right now, ExaStencils is the only code generation backend supported. To
install ExaStencils together with all of its requirements you can
execute the following instructions.

.. code:: ipython3

    sudo apt-get install -y g++ openjdk-11-jdk
    # Install the ExaStencils framework
    wget -nc https://github.com/lssfau/ExaStencils/archive/refs/tags/v1.1.zip
    wget -nc https://github.com/sbt/sbt/releases/download/v1.8.0/sbt-1.8.0.zip
    unzip -n v1.1.zip && mv -vn ExaStencils-1.1 exastencils
    unzip -n sbt-1.8.0.zip
    cd exastencils
    ../sbt/bin/sbt compile
    ../sbt/bin/sbt assembly
    cd ..

Optional: Install LFA Lab for Convergence Estimation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Additionally, EvoStencils includes experimental support for predicting
the convergence of a geometric multigrid method using the library `LFA
Lab <https://github.com/hrittich/lfa-lab>`__. Note that while this
approach is not recommended due to its limited accuracy, it can be
useful for testing purposes, since it drastically speeds up the
evaluation. To install LFA Lab, execute the following instructions.

.. code:: ipython3

    # Optional: Install LFA Lab for convergence estimation
    sudo apt-get install -y cmake swig libeigen3-dev liblapack-dev
    git clone https://github.com/hrittich/lfa-lab.git lfa-lab
    cmake -DWITH_TESTS=FALSE lfa-lab
    make -j4 lfa-lab
    cd lfa-lab
    make install
    cd ..

Overview and Problem Definition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After installing all requirements, we can import EvoStencils’
functionality for the automated design of multigrid methods. Here, the
``Optimizer`` class bundles all functionality for the execution of a
grammar-based evolutionary program synthesis that enables the automated
design of an efficient multigrid method for the given problem. As a
second component, the ``ProgramGenerator`` class provides a frontend for
the automated generation and evaluation of each generated
multigrid-based solver using the
`ExaStencils <https://www.exastencils.fau.de>`__ framework. To speed up
the evaluation of each solver, ExaStencils can generate an
OpenMP-parallel C++ implementation. For this purpose, the number of
threads can be specified in the file
``example_problems/lib/parallelization_pureOmp.knowledge``.

.. code:: ipython3

    from evostencils.optimization.program import Optimizer
    from evostencils.code_generation.exastencils import ProgramGenerator
    import os
    import sys

The first step of running EvoStencils is to define the (PDE-based)
problem that we aim to solve. EvoStencils is capable of extracting all
information about a discretized PDE from its specification in
ExaStencils’ domain-specific language **ExaSlang**. Note that a number
of example problems are included in the ``example_problems`` folder.
Here we use a simple two-dimensional Poisson problem as an example,
which is given by

.. math::


   -\nabla^{2} u = \pi^2 \cos(\pi x) - 4 \pi^2 \sin(2 \pi y) \quad \text{in} \; \Omega \\
   u = \cos(\pi x) - \sin(\pi y) \quad \text{on} \; \partial \Omega.

In the given case, we choose :math:`\Omega = (0, 1)^2` on which we
discretize the equation using finite differences with :math:`h = 1/2^l`,
where :math:`l = 8`. As a result, we obtain the usual five-point stencil
given by

.. math::


   \nabla^2_h = 
   \frac{1}{h^2} \begin{bmatrix}
           0 & 1 & 0\\
           1 & -4 & 1 \\
           0 & 1 & 0  
   \end{bmatrix}.

.. code:: ipython3

    
    cwd = f'{os.getcwd()}/../'
    # Path to the ExaStencils compiler
    compiler_path = f'{cwd}/exastencils/Compiler/Compiler.jar'
    # Path to base folder
    base_path = f'{cwd}/example_problems'
    # Relative path to platform file (from base folder)
    platform_path = f'lib/linux.platform'
    # Example problem from L2
    # Relative path to settings file (from base folder)
    settings_path = f'Poisson/2D_FD_Poisson_fromL2.settings'
    knowledge_path = f'Poisson/2D_FD_Poisson_fromL2.knowledge'
    cycle_name = "gen_mgCycle"  # Default name
    # Additional global parameter values within the PDE system
    pde_parameter_values = None
    # The maximum number of iterations considered acceptable for a solver
    solver_iteration_limit = 500
    # Set up MPI
    comm = None
    nprocs = 1
    mpi_rank = 0

In case a PDE includes additional parameters that depend on the
discretization, these can be dynamically adjusted during the
optimization. For instance, the following code can be used to set the
correct values for the wavenumber :math:`k` within the provided example
of the Helmholtz equation.

.. code:: ipython3

    # Special treatment of parameters for the Helmholtz example
    if "Helmholtz" in knowledge_path or "Helmholtz" in settings_path:
        values = [80.0 * 2.0**i for i in range(100)]
        pde_parameter_values = {'k': values}
        solver_iteration_limit = 10000

If you have only access to limited compute capabilities, the model-based
prediction can be employed for testing purposes. However, note that this
approach leads to inaccurate evaluations and only works when the
multigrid method is directly used as an iterative solver.

.. code:: ipython3

    # Only recommended for testing purposes
    # Use model based estimation instead of code generation and model_based_prediction
    model_based_estimation = False

Optional: Enable MPI
~~~~~~~~~~~~~~~~~~~~

Finding competitive multigrid methods requires the evaluation of a large
number of solvers, which is usually not feasible on a single compute
node. As a remedy, EvoStencils offers support for an MPI-based
parallelization using the `mpi4py <https://github.com/mpi4py/mpi4py>`__
library, which provides a convenient interface to MPI within Python. To
enable a MPI-based parallelization within EvoStencils you can execute
the following cell. The creation of the right number of MPI processes
and their mapping to a particular computer (or cluster) architecture
needs to be managed by the actual MPI implementation, such as OpenMPI. A
general advice is to execute and bind each process to a dedicated
multi-core node, such that the evaluation of each solver can be sped up
with an OpenMP-based parallelization, which can be performed
automatically by ExaStencils.

.. code:: ipython3

    from mpi4py import MPI
    comm = MPI.COMM_WORLD
    nprocs = comm.Get_size()
    mpi_rank = comm.Get_rank()

Code Generation Setup
~~~~~~~~~~~~~~~~~~~~~

Next, we create a ``ProgramGenerator`` object which automatically
extracts all information about the given problem from its respective
ExaSlang specification. This object then provides an interface for the
evaluation of each solver that is independent of the backend used for
code generation (in our case ExaStencils).

.. code:: ipython3

    program_generator = ProgramGenerator(compiler_path, base_path, settings_path, knowledge_path, platform_path, mpi_rank=mpi_rank, 
                                         cycle_name=cycle_name, model_based_estimation=model_based_estimation,
                                         solver_iteration_limit=solver_iteration_limit)

.. code:: ipython3

    # Obtain extracted information from program generator
    dimension = program_generator.dimension  # Dimensionality of the problem
    finest_grid = program_generator.finest_grid  # Representation of the finest grid
    coarsening_factors = program_generator.coarsening_factor
    min_level = program_generator.min_level  # Minimum discretization level
    max_level = program_generator.max_level  # Maximum discretization level
    equations = program_generator.equations  # System of PDEs in SymPy
    operators = program_generator.operators  # Discretized differential operators
    fields = program_generator.fields  # Variables that occur within system of PDEs
    problem_name = program_generator.problem_name
    convergence_evaluator = None
    performance_evaluator = None

Optional: Enable Model-Based Estimation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you want to use a model-based prediction for evaluating the quality
of each multigrid method, the ``ConvergenceEvaluator`` and
``PerformanceEvaluator`` classes can be used. The former provides an
interface to LFA Lab’s functionality that automates the process of
predicting a method’s speed of convergence. The latter implements a
simple roofline model for estimating its execution time on a given CPU
architecture. For this purpose, the user needs to specify the peak
floating-point performance and memory bandwidth of the target machine.

.. code:: ipython3

    if model_based_estimation:
        # Create convergence and performance evaluator objects
        # Convergence evaluation requires a working LFA Lab installation!
        # Only needed when a model-based estimation should be used within the optimization
        # (Not recommended due to the limitations, but useful for testing)
        from evostencils.model_based_prediction.convergence import ConvergenceEvaluator
        from evostencils.model_based_prediction.performance import PerformanceEvaluator
        convergence_evaluator = ConvergenceEvaluator(dimension, coarsening_factors, finest_grid)
        # Peak FLOP performance of the machine
        peak_flops = 16 * 6 * 2.6 * 1e9
        # Peak memory bandwidth of the machine
        peak_bandwidth = 45.8 * 1e9
        # Number of bytes per word
        bytes_per_word = 8  # Double = 64 Bit = 8 Bytes
        performance_evaluator = PerformanceEvaluator(peak_flops, peak_bandwidth, bytes_per_word)

Optimization Setup
~~~~~~~~~~~~~~~~~~

After defining the problem that we aim to solve and setting up the code
generation and evaluation, we can finally introduce EvoStencils’ program
synthesis toolchain. Here, the ``Optimizer`` class provides a unified
user interface for the execution of each step towards the automated
design of an efficient multigrid method for the target problem.

To deal with unexpected system crashes and cases where an optimization
exceeds a certain time limit, EvoStencils implements a simple
checkpointing mechanism that makes use of Python’s ``pickle`` module to
store the current state of the optimization in a file. For this purpose,
we need to create a directory for storing the respective checkpoint
files of each MPI process.

.. code:: ipython3

    if mpi_rank == 0 and not os.path.exists(f'{cwd}/{problem_name}'):
        # Create directory for checkpoints and output data
        os.makedirs(f'{cwd}/{problem_name}')
    # Path to directory for storing checkpoints
    checkpoint_directory_path = f'{cwd}/{problem_name}/checkpoints_{mpi_rank}'

Next, we create an ``Optimizer`` object based on the variables defined
above that contain the information extracted from the respective
ExaSlang representation. Since the ``Optimizer`` class makes use of the
interface provided by the ``ProgramGenerator`` (and potentially
``ConvergenceEvaluator`` and ``PerformanceEvaluator``) class to evaluate
each generated solver, the respective objects must be provided as an
argument.

.. code:: ipython3

    optimizer = Optimizer(dimension, finest_grid, coarsening_factors, min_level, max_level, equations, operators, fields, 
                          mpi_comm=comm, mpi_rank=mpi_rank, number_of_mpi_processes=nprocs, program_generator=program_generator, 
                          convergence_evaluator=convergence_evaluator, performance_evaluator=performance_evaluator, 
                          checkpoint_directory_path=checkpoint_directory_path)

Note that each multigrid method is formulated on a specific hierarchy of
discretization. In the given case, each grid in this hierarchy is
defined by its step size :math:`h`, which is given by :math:`h = 1/2^l`
with :math:`l \in [l_{min}, l_{max}]`. EvoStencils automatically
generates a grammar that allows to treat the automated design of
multigrid methods as a program synthesis task. Since the number of
productions of this grammar increases with the number of coarsening
steps, i.e. :math:`l_{max} - l_{min}`, the resulting search space grows
accordingly and the user is advised to limit the number of levels
considered to at most 4 or 5. As a remedy, EvoStencils allows to split
the optimization into multiple runs. In this case, it first aims to find
an efficient multigrid method on the coarsest levels, which is then
employed as a coarse-grid solver for an existing multigrid
implementation provided within the respective DSL code. This process is
then continued on the next-higher levels until the finest grid has been
reached. However, note that the outcome of this approach might be
inferior to a result obtained with an optimization on the full range of
levels.

.. code:: ipython3

    # Option to split the optimization into multiple runs,
    # where each run is only performed on a subrange of the discretization hierarchy starting at the top (finest grid)
    levels_per_run = max_level - min_level
    if model_based_estimation:
        # Model-based estimation only feasible for up to 2 levels per run
        levels_per_run = 2
    assert levels_per_run <= 5, "Can not optimize more than 5 levels"

After creating an initial ``Optimizer`` object, we can proceed with the
execution of the actual program synthesis. To solve the resulting
optimization problem EvoStencils utilizes evolutionary computation in
the form of grammar-guided genetic programming (G3P). This method
evolves a population of individuals represented as derivation trees,
where each tree corresponds to a unique multigrid method, based on the
principle of natural evolution. To control the behavior of this
algorithm, we define a number of variables, which are later passed to
the respective method of the ``Optimizer`` class.

For this purpose, we first need to define a measure of the fitness of
each individual based on which we can select individuals for
reproduction. In general, EvoStencils provides functionality for both
single and multi-objective optimization. In the case of the latter, each
individual is evaluated according to two objectives: its execution time
per iteration :math:`t` and its convergence factor :math:`\rho`. More
information can be found in https://rdcu.be/c3xew. To perform a
multi-objective evolutionary optimization,
`NSGA-II <https://doi.org/10.1109/4235.996017>`__ represents a
reasonable default choice.

Next, we need to set the number of generations, the total population
size :math:`\mu` and the number of newly-created individuals (offspring)
per process :math:`\lambda`. For instance, running EvoStencils with four
MPI process, :math:`\mu = 16` and :math:`\lambda = 4` means that in each
generation a total number of 16 new individuals is created based on an
existing population of 16 individuals. Note that the values provided in
this tutorial are suitable for testing, but are unlikely to lead to a
good result. In addition, a larger (or smaller) initial population can
be chosen by setting the variable ``population_initialization_factor``
accordingly. As an alternative to running an actual evolutionary
algorithm, EvoStencils also provides a simple random search, which
randomly generates :math:`\lambda` individuals per process for the
specified number of generations.

To control the probability of creating new individuals via crossover and
mutation, the value of the parameters ``crossover_probability`` and
``mutation_probability`` can be adjusted. In addition, we can set the
number of samples used for the evaluation of each solver (in the case of
a code generation-based evaluation).

.. code:: ipython3

    # Choose optimization method
    optimization_method = optimizer.NSGAII
    
    generations = 10  # Number of generations
    mu_ = 4  # Population size
    lambda_ = 4  # Number of offspring
    # Option to use random search instead of crossover and mutation to create new individuals
    use_random_search = False
    population_initialization_factor = 4  # Multiply mu_ by this factor to set the initial population size
    crossover_probability = 0.9
    mutation_probability = 1.0 - crossover_probability
    node_replacement_probability = 0.1  # Probability to perform mutation by altering a single node in the tree
    evaluation_samples = 3  # Number of evaluation samples

Before we can run the evolutionary program synthesis based on the
parameters specified, we need to discuss two additional options. First
of all, EvoStencils allows to increase the problem size during the
optimizaton dynamically. For this purpose, the parameter
``generalization_interval`` can be set, which specifies the number of
generations after which both :math:`l_{min}` and :math:`l_{max}` are
incremented. Note that this triggers a reevaluation of all individuals
in the current population. If the user does not want to apply this
feature the generalization interval can simply be set to a value greater
than the total number of generations.

As a second feature, EvoStencils allows to continue the evolutionary
program synthesis from a saved checkpoint. Note that this requires the
total number of generations to be higher than the generation in which
the checkpoint was created. Furthermore, the symbols that occur within
the grammar-based representation of each solver contained in the
checkpointed population must match with the ones of the new program
synthesis run. Since at this point no additional checks are performed,
the user is advised to enable this feature with caution and to be
prepared to deal with unexpected errors.

.. code:: ipython3

    # Number of generations after which a generalization is performed
    # This is achieved by incrementing min_level and max_level within the optimization
    # Such that a larger (and potentially more difficult) instance of the same problem is considered in subsequent generations
    generalization_interval = 50
    # Option to continue from the checkpoint of a previous optimization
    # Warning: So far no check is performed whether the checkpoint is compatible with the current optimization setting
    continue_from_checkpoint = False

Running the Evolutionary Program Synthesis
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After setting all necessary parameters, we can execute the evolutionary
program synthesis. For this purpose, the ``Optimizer`` class provides
the method ``evolutionary_optimization``, which can be executed by
providing the parameters defined above as its arguments.

After the execution of this method is finished, its outcome is returned
in the form of five Python objects. Here, a grammatical representation
of the multigrid method that leads to the fastest solving time for the
largest problem considered during the optimization is stored in the
first return value. Accordingly, the second value contains the
corresponding DSL code of this method.

The remaining three return values store information gathered during the
optimization in the form of the final population, a
`statistics <https://deap.readthedocs.io/en/master/api/tools.html#statistics>`__
data structure and a
`hall-of-fame <https://deap.readthedocs.io/en/master/api/tools.html#hall-of-fame>`__
of the best multigrid methods according to the optimization
objective(s). Note that in case the evolutionary program synthesis is
performed on the full range of levels, each of these objects is a list
containing only a single entry. Otherwise, the list contains an object
for each respective sub-run.

.. code:: ipython3

    # Return values of the optimization
    # program: Grammar string representing the multigrid method on the topmost levels
    # dsl_code: ExaSlang program string representing the multigrid solver functions
    # pops: Populations at the end of each optimization run on the respective subrange of the discretization hierarchy
    # stats: Statistics structure (data structure provided by the DEAP framework)
    # hofs: Hall-of-fames at the end of each optimization run on the respective subrange of the discretization hierarchy
    program, dsl_code, pops, stats, hofs = optimizer.evolutionary_optimization(optimization_method=optimization_method, 
                                                                     use_random_search=use_random_search, 
                                                                     mu_=mu_, lambda_=lambda_, 
                                                                     population_initialization_factor=population_initialization_factor,
                                                                     generations=generations, 
                                                                     generalization_interval=generalization_interval,
                                                                     crossover_probability=crossover_probability,
                                                                     mutation_probability=mutation_probability,
                                                                     node_replacement_probability=node_replacement_probability,
                                                                     levels_per_run=levels_per_run,
                                                                     evaluation_samples=evaluation_samples,
                                                                     model_based_estimation=model_based_estimation,
                                                                     pde_parameter_values=pde_parameter_values,
                                                                     continue_from_checkpoint=continue_from_checkpoint)

Storing and Examining the Results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After the evolutionary program synthesis has finished, we can examine
the results, for instance by printing the DSL code of the best method
discovered during the optimization. Furthermore, we can store both the
final population and the hall-of-fame in a file using the
``dump_data_structure`` method of the Optimizer class.

.. code:: ipython3

    # Print the outcome of the optimization and store the data and statistics
    if mpi_rank == 0:
        print(f'\nExaSlang Code:\n{dsl_code}\n', flush=True)
        if not os.path.exists(f'./{problem_name}'):
            os.makedirs(f'./{problem_name}')
        j = 0
        log_dir_name = f'./{problem_name}/data_{j}'
        while os.path.exists(log_dir_name):
            j += 1
            log_dir_name = f'./{problem_name}/data_{j}'
        os.makedirs(log_dir_name)
        for i, log in enumerate(stats):
            optimizer.dump_data_structure(log, f"{log_dir_name}/log_{i}.p")
        for i, pop in enumerate(pops):
            optimizer.dump_data_structure(pop, f"{log_dir_name}/pop_{i}.p")
        for i, hof in enumerate(hofs):
            hof_dir = f'{log_dir_name}/hof_{i}'
            os.makedirs(hof_dir)
            for j, ind in enumerate(hof):
                with open(f'{hof_dir}/individual_{j}.txt', 'w') as grammar_file:
                    grammar_file.write(str(ind) + '\n')

Conclusion
~~~~~~~~~~

We hope that this tutorial provides a reasonable introduction to
EvoStencils’ capabilities. Note that the functionality offered by the
``optimize.py`` script (located in the ``scripts`` subfolder) is similar
to this notebook.