Tutorial

This is a short tutorial how to use the EvoStencils package. After cloning the repository, we first have to install the required Python packages.

# Install evostencils package with requirements
pip install -e .

To automatically generate a parallel implementation of each multigrid-based solver discovered through evolutionary program synthesis, EvoStencils makes use of the ExaStencils framework. Note that right now, ExaStencils is the only code generation backend supported. To install ExaStencils together with all of its requirements you can execute the following instructions.

sudo apt-get install -y g++ openjdk-11-jdk
# Install the ExaStencils framework
wget -nc https://github.com/lssfau/ExaStencils/archive/refs/tags/v1.1.zip
wget -nc https://github.com/sbt/sbt/releases/download/v1.8.0/sbt-1.8.0.zip
unzip -n v1.1.zip && mv -vn ExaStencils-1.1 exastencils
unzip -n sbt-1.8.0.zip
cd exastencils
../sbt/bin/sbt compile
../sbt/bin/sbt assembly
cd ..

Optional: Install LFA Lab for Convergence Estimation

Additionally, EvoStencils includes experimental support for predicting the convergence of a geometric multigrid method using the library LFA Lab. Note that while this approach is not recommended due to its limited accuracy, it can be useful for testing purposes, since it drastically speeds up the evaluation. To install LFA Lab, execute the following instructions.

# Optional: Install LFA Lab for convergence estimation
sudo apt-get install -y cmake swig libeigen3-dev liblapack-dev
git clone https://github.com/hrittich/lfa-lab.git lfa-lab
cmake -DWITH_TESTS=FALSE lfa-lab
make -j4 lfa-lab
cd lfa-lab
make install
cd ..

Overview and Problem Definition

After installing all requirements, we can import EvoStencils’ functionality for the automated design of multigrid methods. Here, the Optimizer class bundles all functionality for the execution of a grammar-based evolutionary program synthesis that enables the automated design of an efficient multigrid method for the given problem. As a second component, the ProgramGenerator class provides a frontend for the automated generation and evaluation of each generated multigrid-based solver using the ExaStencils framework. To speed up the evaluation of each solver, ExaStencils can generate an OpenMP-parallel C++ implementation. For this purpose, the number of threads can be specified in the file example_problems/lib/parallelization_pureOmp.knowledge.

from evostencils.optimization.program import Optimizer
from evostencils.code_generation.exastencils import ProgramGenerator
import os
import sys

The first step of running EvoStencils is to define the (PDE-based) problem that we aim to solve. EvoStencils is capable of extracting all information about a discretized PDE from its specification in ExaStencils’ domain-specific language ExaSlang. Note that a number of example problems are included in the example_problems folder. Here we use a simple two-dimensional Poisson problem as an example, which is given by

\[\begin{split}-\nabla^{2} u = \pi^2 \cos(\pi x) - 4 \pi^2 \sin(2 \pi y) \quad \text{in} \; \Omega \\ u = \cos(\pi x) - \sin(\pi y) \quad \text{on} \; \partial \Omega.\end{split}\]

In the given case, we choose \(\Omega = (0, 1)^2\) on which we discretize the equation using finite differences with \(h = 1/2^l\), where \(l = 8\). As a result, we obtain the usual five-point stencil given by

\[\begin{split}\nabla^2_h = \frac{1}{h^2} \begin{bmatrix} 0 & 1 & 0\\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{bmatrix}.\end{split}\]
cwd = f'{os.getcwd()}/../'
# Path to the ExaStencils compiler
compiler_path = f'{cwd}/exastencils/Compiler/Compiler.jar'
# Path to base folder
base_path = f'{cwd}/example_problems'
# Relative path to platform file (from base folder)
platform_path = f'lib/linux.platform'
# Example problem from L2
# Relative path to settings file (from base folder)
settings_path = f'Poisson/2D_FD_Poisson_fromL2.settings'
knowledge_path = f'Poisson/2D_FD_Poisson_fromL2.knowledge'
cycle_name = "gen_mgCycle"  # Default name
# Additional global parameter values within the PDE system
pde_parameter_values = None
# The maximum number of iterations considered acceptable for a solver
solver_iteration_limit = 500
# Set up MPI
comm = None
nprocs = 1
mpi_rank = 0

In case a PDE includes additional parameters that depend on the discretization, these can be dynamically adjusted during the optimization. For instance, the following code can be used to set the correct values for the wavenumber \(k\) within the provided example of the Helmholtz equation.

# Special treatment of parameters for the Helmholtz example
if "Helmholtz" in knowledge_path or "Helmholtz" in settings_path:
    values = [80.0 * 2.0**i for i in range(100)]
    pde_parameter_values = {'k': values}
    solver_iteration_limit = 10000

If you have only access to limited compute capabilities, the model-based prediction can be employed for testing purposes. However, note that this approach leads to inaccurate evaluations and only works when the multigrid method is directly used as an iterative solver.

# Only recommended for testing purposes
# Use model based estimation instead of code generation and model_based_prediction
model_based_estimation = False

Optional: Enable MPI

Finding competitive multigrid methods requires the evaluation of a large number of solvers, which is usually not feasible on a single compute node. As a remedy, EvoStencils offers support for an MPI-based parallelization using the mpi4py library, which provides a convenient interface to MPI within Python. To enable a MPI-based parallelization within EvoStencils you can execute the following cell. The creation of the right number of MPI processes and their mapping to a particular computer (or cluster) architecture needs to be managed by the actual MPI implementation, such as OpenMPI. A general advice is to execute and bind each process to a dedicated multi-core node, such that the evaluation of each solver can be sped up with an OpenMP-based parallelization, which can be performed automatically by ExaStencils.

from mpi4py import MPI
comm = MPI.COMM_WORLD
nprocs = comm.Get_size()
mpi_rank = comm.Get_rank()

Code Generation Setup

Next, we create a ProgramGenerator object which automatically extracts all information about the given problem from its respective ExaSlang specification. This object then provides an interface for the evaluation of each solver that is independent of the backend used for code generation (in our case ExaStencils).

program_generator = ProgramGenerator(compiler_path, base_path, settings_path, knowledge_path, platform_path, mpi_rank=mpi_rank,
                                     cycle_name=cycle_name, model_based_estimation=model_based_estimation,
                                     solver_iteration_limit=solver_iteration_limit)
# Obtain extracted information from program generator
dimension = program_generator.dimension  # Dimensionality of the problem
finest_grid = program_generator.finest_grid  # Representation of the finest grid
coarsening_factors = program_generator.coarsening_factor
min_level = program_generator.min_level  # Minimum discretization level
max_level = program_generator.max_level  # Maximum discretization level
equations = program_generator.equations  # System of PDEs in SymPy
operators = program_generator.operators  # Discretized differential operators
fields = program_generator.fields  # Variables that occur within system of PDEs
problem_name = program_generator.problem_name
convergence_evaluator = None
performance_evaluator = None

Optional: Enable Model-Based Estimation

If you want to use a model-based prediction for evaluating the quality of each multigrid method, the ConvergenceEvaluator and PerformanceEvaluator classes can be used. The former provides an interface to LFA Lab’s functionality that automates the process of predicting a method’s speed of convergence. The latter implements a simple roofline model for estimating its execution time on a given CPU architecture. For this purpose, the user needs to specify the peak floating-point performance and memory bandwidth of the target machine.

if model_based_estimation:
    # Create convergence and performance evaluator objects
    # Convergence evaluation requires a working LFA Lab installation!
    # Only needed when a model-based estimation should be used within the optimization
    # (Not recommended due to the limitations, but useful for testing)
    from evostencils.model_based_prediction.convergence import ConvergenceEvaluator
    from evostencils.model_based_prediction.performance import PerformanceEvaluator
    convergence_evaluator = ConvergenceEvaluator(dimension, coarsening_factors, finest_grid)
    # Peak FLOP performance of the machine
    peak_flops = 16 * 6 * 2.6 * 1e9
    # Peak memory bandwidth of the machine
    peak_bandwidth = 45.8 * 1e9
    # Number of bytes per word
    bytes_per_word = 8  # Double = 64 Bit = 8 Bytes
    performance_evaluator = PerformanceEvaluator(peak_flops, peak_bandwidth, bytes_per_word)

Optimization Setup

After defining the problem that we aim to solve and setting up the code generation and evaluation, we can finally introduce EvoStencils’ program synthesis toolchain. Here, the Optimizer class provides a unified user interface for the execution of each step towards the automated design of an efficient multigrid method for the target problem.

To deal with unexpected system crashes and cases where an optimization exceeds a certain time limit, EvoStencils implements a simple checkpointing mechanism that makes use of Python’s pickle module to store the current state of the optimization in a file. For this purpose, we need to create a directory for storing the respective checkpoint files of each MPI process.

if mpi_rank == 0 and not os.path.exists(f'{cwd}/{problem_name}'):
    # Create directory for checkpoints and output data
    os.makedirs(f'{cwd}/{problem_name}')
# Path to directory for storing checkpoints
checkpoint_directory_path = f'{cwd}/{problem_name}/checkpoints_{mpi_rank}'

Next, we create an Optimizer object based on the variables defined above that contain the information extracted from the respective ExaSlang representation. Since the Optimizer class makes use of the interface provided by the ProgramGenerator (and potentially ConvergenceEvaluator and PerformanceEvaluator) class to evaluate each generated solver, the respective objects must be provided as an argument.

optimizer = Optimizer(dimension, finest_grid, coarsening_factors, min_level, max_level, equations, operators, fields,
                      mpi_comm=comm, mpi_rank=mpi_rank, number_of_mpi_processes=nprocs, program_generator=program_generator,
                      convergence_evaluator=convergence_evaluator, performance_evaluator=performance_evaluator,
                      checkpoint_directory_path=checkpoint_directory_path)

Note that each multigrid method is formulated on a specific hierarchy of discretization. In the given case, each grid in this hierarchy is defined by its step size \(h\), which is given by \(h = 1/2^l\) with \(l \in [l_{min}, l_{max}]\). EvoStencils automatically generates a grammar that allows to treat the automated design of multigrid methods as a program synthesis task. Since the number of productions of this grammar increases with the number of coarsening steps, i.e. \(l_{max} - l_{min}\), the resulting search space grows accordingly and the user is advised to limit the number of levels considered to at most 4 or 5. As a remedy, EvoStencils allows to split the optimization into multiple runs. In this case, it first aims to find an efficient multigrid method on the coarsest levels, which is then employed as a coarse-grid solver for an existing multigrid implementation provided within the respective DSL code. This process is then continued on the next-higher levels until the finest grid has been reached. However, note that the outcome of this approach might be inferior to a result obtained with an optimization on the full range of levels.

# Option to split the optimization into multiple runs,
# where each run is only performed on a subrange of the discretization hierarchy starting at the top (finest grid)
levels_per_run = max_level - min_level
if model_based_estimation:
    # Model-based estimation only feasible for up to 2 levels per run
    levels_per_run = 2
assert levels_per_run <= 5, "Can not optimize more than 5 levels"

After creating an initial Optimizer object, we can proceed with the execution of the actual program synthesis. To solve the resulting optimization problem EvoStencils utilizes evolutionary computation in the form of grammar-guided genetic programming (G3P). This method evolves a population of individuals represented as derivation trees, where each tree corresponds to a unique multigrid method, based on the principle of natural evolution. To control the behavior of this algorithm, we define a number of variables, which are later passed to the respective method of the Optimizer class.

For this purpose, we first need to define a measure of the fitness of each individual based on which we can select individuals for reproduction. In general, EvoStencils provides functionality for both single and multi-objective optimization. In the case of the latter, each individual is evaluated according to two objectives: its execution time per iteration \(t\) and its convergence factor \(\rho\). More information can be found in https://rdcu.be/c3xew. To perform a multi-objective evolutionary optimization, NSGA-II represents a reasonable default choice.

Next, we need to set the number of generations, the total population size \(\mu\) and the number of newly-created individuals (offspring) per process \(\lambda\). For instance, running EvoStencils with four MPI process, \(\mu = 16\) and \(\lambda = 4\) means that in each generation a total number of 16 new individuals is created based on an existing population of 16 individuals. Note that the values provided in this tutorial are suitable for testing, but are unlikely to lead to a good result. In addition, a larger (or smaller) initial population can be chosen by setting the variable population_initialization_factor accordingly. As an alternative to running an actual evolutionary algorithm, EvoStencils also provides a simple random search, which randomly generates \(\lambda\) individuals per process for the specified number of generations.

To control the probability of creating new individuals via crossover and mutation, the value of the parameters crossover_probability and mutation_probability can be adjusted. In addition, we can set the number of samples used for the evaluation of each solver (in the case of a code generation-based evaluation).

# Choose optimization method
optimization_method = optimizer.NSGAII

generations = 10  # Number of generations
mu_ = 4  # Population size
lambda_ = 4  # Number of offspring
# Option to use random search instead of crossover and mutation to create new individuals
use_random_search = False
population_initialization_factor = 4  # Multiply mu_ by this factor to set the initial population size
crossover_probability = 0.9
mutation_probability = 1.0 - crossover_probability
node_replacement_probability = 0.1  # Probability to perform mutation by altering a single node in the tree
evaluation_samples = 3  # Number of evaluation samples

Before we can run the evolutionary program synthesis based on the parameters specified, we need to discuss two additional options. First of all, EvoStencils allows to increase the problem size during the optimizaton dynamically. For this purpose, the parameter generalization_interval can be set, which specifies the number of generations after which both \(l_{min}\) and \(l_{max}\) are incremented. Note that this triggers a reevaluation of all individuals in the current population. If the user does not want to apply this feature the generalization interval can simply be set to a value greater than the total number of generations.

As a second feature, EvoStencils allows to continue the evolutionary program synthesis from a saved checkpoint. Note that this requires the total number of generations to be higher than the generation in which the checkpoint was created. Furthermore, the symbols that occur within the grammar-based representation of each solver contained in the checkpointed population must match with the ones of the new program synthesis run. Since at this point no additional checks are performed, the user is advised to enable this feature with caution and to be prepared to deal with unexpected errors.

# Number of generations after which a generalization is performed
# This is achieved by incrementing min_level and max_level within the optimization
# Such that a larger (and potentially more difficult) instance of the same problem is considered in subsequent generations
generalization_interval = 50
# Option to continue from the checkpoint of a previous optimization
# Warning: So far no check is performed whether the checkpoint is compatible with the current optimization setting
continue_from_checkpoint = False

Running the Evolutionary Program Synthesis

After setting all necessary parameters, we can execute the evolutionary program synthesis. For this purpose, the Optimizer class provides the method evolutionary_optimization, which can be executed by providing the parameters defined above as its arguments.

After the execution of this method is finished, its outcome is returned in the form of five Python objects. Here, a grammatical representation of the multigrid method that leads to the fastest solving time for the largest problem considered during the optimization is stored in the first return value. Accordingly, the second value contains the corresponding DSL code of this method.

The remaining three return values store information gathered during the optimization in the form of the final population, a statistics data structure and a hall-of-fame of the best multigrid methods according to the optimization objective(s). Note that in case the evolutionary program synthesis is performed on the full range of levels, each of these objects is a list containing only a single entry. Otherwise, the list contains an object for each respective sub-run.

# Return values of the optimization
# program: Grammar string representing the multigrid method on the topmost levels
# dsl_code: ExaSlang program string representing the multigrid solver functions
# pops: Populations at the end of each optimization run on the respective subrange of the discretization hierarchy
# stats: Statistics structure (data structure provided by the DEAP framework)
# hofs: Hall-of-fames at the end of each optimization run on the respective subrange of the discretization hierarchy
program, dsl_code, pops, stats, hofs = optimizer.evolutionary_optimization(optimization_method=optimization_method,
                                                                 use_random_search=use_random_search,
                                                                 mu_=mu_, lambda_=lambda_,
                                                                 population_initialization_factor=population_initialization_factor,
                                                                 generations=generations,
                                                                 generalization_interval=generalization_interval,
                                                                 crossover_probability=crossover_probability,
                                                                 mutation_probability=mutation_probability,
                                                                 node_replacement_probability=node_replacement_probability,
                                                                 levels_per_run=levels_per_run,
                                                                 evaluation_samples=evaluation_samples,
                                                                 model_based_estimation=model_based_estimation,
                                                                 pde_parameter_values=pde_parameter_values,
                                                                 continue_from_checkpoint=continue_from_checkpoint)

Storing and Examining the Results

After the evolutionary program synthesis has finished, we can examine the results, for instance by printing the DSL code of the best method discovered during the optimization. Furthermore, we can store both the final population and the hall-of-fame in a file using the dump_data_structure method of the Optimizer class.

# Print the outcome of the optimization and store the data and statistics
if mpi_rank == 0:
    print(f'\nExaSlang Code:\n{dsl_code}\n', flush=True)
    if not os.path.exists(f'./{problem_name}'):
        os.makedirs(f'./{problem_name}')
    j = 0
    log_dir_name = f'./{problem_name}/data_{j}'
    while os.path.exists(log_dir_name):
        j += 1
        log_dir_name = f'./{problem_name}/data_{j}'
    os.makedirs(log_dir_name)
    for i, log in enumerate(stats):
        optimizer.dump_data_structure(log, f"{log_dir_name}/log_{i}.p")
    for i, pop in enumerate(pops):
        optimizer.dump_data_structure(pop, f"{log_dir_name}/pop_{i}.p")
    for i, hof in enumerate(hofs):
        hof_dir = f'{log_dir_name}/hof_{i}'
        os.makedirs(hof_dir)
        for j, ind in enumerate(hof):
            with open(f'{hof_dir}/individual_{j}.txt', 'w') as grammar_file:
                grammar_file.write(str(ind) + '\n')

Conclusion

We hope that this tutorial provides a reasonable introduction to EvoStencils’ capabilities. Note that the functionality offered by the optimize.py script (located in the scripts subfolder) is similar to this notebook.