Tutorial ======== This is a short tutorial how to use the **EvoStencils** package. After cloning the repository, we first have to install the required Python packages. .. code:: ipython3 # Install evostencils package with requirements pip install -e . To automatically generate a parallel implementation of each multigrid-based solver discovered through evolutionary program synthesis, EvoStencils makes use of the `ExaStencils `__ framework. Note that right now, ExaStencils is the only code generation backend supported. To install ExaStencils together with all of its requirements you can execute the following instructions. .. code:: ipython3 sudo apt-get install -y g++ openjdk-11-jdk # Install the ExaStencils framework wget -nc https://github.com/lssfau/ExaStencils/archive/refs/tags/v1.1.zip wget -nc https://github.com/sbt/sbt/releases/download/v1.8.0/sbt-1.8.0.zip unzip -n v1.1.zip && mv -vn ExaStencils-1.1 exastencils unzip -n sbt-1.8.0.zip cd exastencils ../sbt/bin/sbt compile ../sbt/bin/sbt assembly cd .. Optional: Install LFA Lab for Convergence Estimation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Additionally, EvoStencils includes experimental support for predicting the convergence of a geometric multigrid method using the library `LFA Lab `__. Note that while this approach is not recommended due to its limited accuracy, it can be useful for testing purposes, since it drastically speeds up the evaluation. To install LFA Lab, execute the following instructions. .. code:: ipython3 # Optional: Install LFA Lab for convergence estimation sudo apt-get install -y cmake swig libeigen3-dev liblapack-dev git clone https://github.com/hrittich/lfa-lab.git lfa-lab cmake -DWITH_TESTS=FALSE lfa-lab make -j4 lfa-lab cd lfa-lab make install cd .. Overview and Problem Definition ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After installing all requirements, we can import EvoStencils’ functionality for the automated design of multigrid methods. Here, the ``Optimizer`` class bundles all functionality for the execution of a grammar-based evolutionary program synthesis that enables the automated design of an efficient multigrid method for the given problem. As a second component, the ``ProgramGenerator`` class provides a frontend for the automated generation and evaluation of each generated multigrid-based solver using the `ExaStencils `__ framework. To speed up the evaluation of each solver, ExaStencils can generate an OpenMP-parallel C++ implementation. For this purpose, the number of threads can be specified in the file ``example_problems/lib/parallelization_pureOmp.knowledge``. .. code:: ipython3 from evostencils.optimization.program import Optimizer from evostencils.code_generation.exastencils import ProgramGenerator import os import sys The first step of running EvoStencils is to define the (PDE-based) problem that we aim to solve. EvoStencils is capable of extracting all information about a discretized PDE from its specification in ExaStencils’ domain-specific language **ExaSlang**. Note that a number of example problems are included in the ``example_problems`` folder. Here we use a simple two-dimensional Poisson problem as an example, which is given by .. math:: -\nabla^{2} u = \pi^2 \cos(\pi x) - 4 \pi^2 \sin(2 \pi y) \quad \text{in} \; \Omega \\ u = \cos(\pi x) - \sin(\pi y) \quad \text{on} \; \partial \Omega. In the given case, we choose :math:`\Omega = (0, 1)^2` on which we discretize the equation using finite differences with :math:`h = 1/2^l`, where :math:`l = 8`. As a result, we obtain the usual five-point stencil given by .. math:: \nabla^2_h = \frac{1}{h^2} \begin{bmatrix} 0 & 1 & 0\\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{bmatrix}. .. code:: ipython3 cwd = f'{os.getcwd()}/../' # Path to the ExaStencils compiler compiler_path = f'{cwd}/exastencils/Compiler/Compiler.jar' # Path to base folder base_path = f'{cwd}/example_problems' # Relative path to platform file (from base folder) platform_path = f'lib/linux.platform' # Example problem from L2 # Relative path to settings file (from base folder) settings_path = f'Poisson/2D_FD_Poisson_fromL2.settings' knowledge_path = f'Poisson/2D_FD_Poisson_fromL2.knowledge' cycle_name = "gen_mgCycle" # Default name # Additional global parameter values within the PDE system pde_parameter_values = None # The maximum number of iterations considered acceptable for a solver solver_iteration_limit = 500 # Set up MPI comm = None nprocs = 1 mpi_rank = 0 In case a PDE includes additional parameters that depend on the discretization, these can be dynamically adjusted during the optimization. For instance, the following code can be used to set the correct values for the wavenumber :math:`k` within the provided example of the Helmholtz equation. .. code:: ipython3 # Special treatment of parameters for the Helmholtz example if "Helmholtz" in knowledge_path or "Helmholtz" in settings_path: values = [80.0 * 2.0**i for i in range(100)] pde_parameter_values = {'k': values} solver_iteration_limit = 10000 If you have only access to limited compute capabilities, the model-based prediction can be employed for testing purposes. However, note that this approach leads to inaccurate evaluations and only works when the multigrid method is directly used as an iterative solver. .. code:: ipython3 # Only recommended for testing purposes # Use model based estimation instead of code generation and model_based_prediction model_based_estimation = False Optional: Enable MPI ~~~~~~~~~~~~~~~~~~~~ Finding competitive multigrid methods requires the evaluation of a large number of solvers, which is usually not feasible on a single compute node. As a remedy, EvoStencils offers support for an MPI-based parallelization using the `mpi4py `__ library, which provides a convenient interface to MPI within Python. To enable a MPI-based parallelization within EvoStencils you can execute the following cell. The creation of the right number of MPI processes and their mapping to a particular computer (or cluster) architecture needs to be managed by the actual MPI implementation, such as OpenMPI. A general advice is to execute and bind each process to a dedicated multi-core node, such that the evaluation of each solver can be sped up with an OpenMP-based parallelization, which can be performed automatically by ExaStencils. .. code:: ipython3 from mpi4py import MPI comm = MPI.COMM_WORLD nprocs = comm.Get_size() mpi_rank = comm.Get_rank() Code Generation Setup ~~~~~~~~~~~~~~~~~~~~~ Next, we create a ``ProgramGenerator`` object which automatically extracts all information about the given problem from its respective ExaSlang specification. This object then provides an interface for the evaluation of each solver that is independent of the backend used for code generation (in our case ExaStencils). .. code:: ipython3 program_generator = ProgramGenerator(compiler_path, base_path, settings_path, knowledge_path, platform_path, mpi_rank=mpi_rank, cycle_name=cycle_name, model_based_estimation=model_based_estimation, solver_iteration_limit=solver_iteration_limit) .. code:: ipython3 # Obtain extracted information from program generator dimension = program_generator.dimension # Dimensionality of the problem finest_grid = program_generator.finest_grid # Representation of the finest grid coarsening_factors = program_generator.coarsening_factor min_level = program_generator.min_level # Minimum discretization level max_level = program_generator.max_level # Maximum discretization level equations = program_generator.equations # System of PDEs in SymPy operators = program_generator.operators # Discretized differential operators fields = program_generator.fields # Variables that occur within system of PDEs problem_name = program_generator.problem_name convergence_evaluator = None performance_evaluator = None Optional: Enable Model-Based Estimation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you want to use a model-based prediction for evaluating the quality of each multigrid method, the ``ConvergenceEvaluator`` and ``PerformanceEvaluator`` classes can be used. The former provides an interface to LFA Lab’s functionality that automates the process of predicting a method’s speed of convergence. The latter implements a simple roofline model for estimating its execution time on a given CPU architecture. For this purpose, the user needs to specify the peak floating-point performance and memory bandwidth of the target machine. .. code:: ipython3 if model_based_estimation: # Create convergence and performance evaluator objects # Convergence evaluation requires a working LFA Lab installation! # Only needed when a model-based estimation should be used within the optimization # (Not recommended due to the limitations, but useful for testing) from evostencils.model_based_prediction.convergence import ConvergenceEvaluator from evostencils.model_based_prediction.performance import PerformanceEvaluator convergence_evaluator = ConvergenceEvaluator(dimension, coarsening_factors, finest_grid) # Peak FLOP performance of the machine peak_flops = 16 * 6 * 2.6 * 1e9 # Peak memory bandwidth of the machine peak_bandwidth = 45.8 * 1e9 # Number of bytes per word bytes_per_word = 8 # Double = 64 Bit = 8 Bytes performance_evaluator = PerformanceEvaluator(peak_flops, peak_bandwidth, bytes_per_word) Optimization Setup ~~~~~~~~~~~~~~~~~~ After defining the problem that we aim to solve and setting up the code generation and evaluation, we can finally introduce EvoStencils’ program synthesis toolchain. Here, the ``Optimizer`` class provides a unified user interface for the execution of each step towards the automated design of an efficient multigrid method for the target problem. To deal with unexpected system crashes and cases where an optimization exceeds a certain time limit, EvoStencils implements a simple checkpointing mechanism that makes use of Python’s ``pickle`` module to store the current state of the optimization in a file. For this purpose, we need to create a directory for storing the respective checkpoint files of each MPI process. .. code:: ipython3 if mpi_rank == 0 and not os.path.exists(f'{cwd}/{problem_name}'): # Create directory for checkpoints and output data os.makedirs(f'{cwd}/{problem_name}') # Path to directory for storing checkpoints checkpoint_directory_path = f'{cwd}/{problem_name}/checkpoints_{mpi_rank}' Next, we create an ``Optimizer`` object based on the variables defined above that contain the information extracted from the respective ExaSlang representation. Since the ``Optimizer`` class makes use of the interface provided by the ``ProgramGenerator`` (and potentially ``ConvergenceEvaluator`` and ``PerformanceEvaluator``) class to evaluate each generated solver, the respective objects must be provided as an argument. .. code:: ipython3 optimizer = Optimizer(dimension, finest_grid, coarsening_factors, min_level, max_level, equations, operators, fields, mpi_comm=comm, mpi_rank=mpi_rank, number_of_mpi_processes=nprocs, program_generator=program_generator, convergence_evaluator=convergence_evaluator, performance_evaluator=performance_evaluator, checkpoint_directory_path=checkpoint_directory_path) Note that each multigrid method is formulated on a specific hierarchy of discretization. In the given case, each grid in this hierarchy is defined by its step size :math:`h`, which is given by :math:`h = 1/2^l` with :math:`l \in [l_{min}, l_{max}]`. EvoStencils automatically generates a grammar that allows to treat the automated design of multigrid methods as a program synthesis task. Since the number of productions of this grammar increases with the number of coarsening steps, i.e. :math:`l_{max} - l_{min}`, the resulting search space grows accordingly and the user is advised to limit the number of levels considered to at most 4 or 5. As a remedy, EvoStencils allows to split the optimization into multiple runs. In this case, it first aims to find an efficient multigrid method on the coarsest levels, which is then employed as a coarse-grid solver for an existing multigrid implementation provided within the respective DSL code. This process is then continued on the next-higher levels until the finest grid has been reached. However, note that the outcome of this approach might be inferior to a result obtained with an optimization on the full range of levels. .. code:: ipython3 # Option to split the optimization into multiple runs, # where each run is only performed on a subrange of the discretization hierarchy starting at the top (finest grid) levels_per_run = max_level - min_level if model_based_estimation: # Model-based estimation only feasible for up to 2 levels per run levels_per_run = 2 assert levels_per_run <= 5, "Can not optimize more than 5 levels" After creating an initial ``Optimizer`` object, we can proceed with the execution of the actual program synthesis. To solve the resulting optimization problem EvoStencils utilizes evolutionary computation in the form of grammar-guided genetic programming (G3P). This method evolves a population of individuals represented as derivation trees, where each tree corresponds to a unique multigrid method, based on the principle of natural evolution. To control the behavior of this algorithm, we define a number of variables, which are later passed to the respective method of the ``Optimizer`` class. For this purpose, we first need to define a measure of the fitness of each individual based on which we can select individuals for reproduction. In general, EvoStencils provides functionality for both single and multi-objective optimization. In the case of the latter, each individual is evaluated according to two objectives: its execution time per iteration :math:`t` and its convergence factor :math:`\rho`. More information can be found in https://rdcu.be/c3xew. To perform a multi-objective evolutionary optimization, `NSGA-II `__ represents a reasonable default choice. Next, we need to set the number of generations, the total population size :math:`\mu` and the number of newly-created individuals (offspring) per process :math:`\lambda`. For instance, running EvoStencils with four MPI process, :math:`\mu = 16` and :math:`\lambda = 4` means that in each generation a total number of 16 new individuals is created based on an existing population of 16 individuals. Note that the values provided in this tutorial are suitable for testing, but are unlikely to lead to a good result. In addition, a larger (or smaller) initial population can be chosen by setting the variable ``population_initialization_factor`` accordingly. As an alternative to running an actual evolutionary algorithm, EvoStencils also provides a simple random search, which randomly generates :math:`\lambda` individuals per process for the specified number of generations. To control the probability of creating new individuals via crossover and mutation, the value of the parameters ``crossover_probability`` and ``mutation_probability`` can be adjusted. In addition, we can set the number of samples used for the evaluation of each solver (in the case of a code generation-based evaluation). .. code:: ipython3 # Choose optimization method optimization_method = optimizer.NSGAII generations = 10 # Number of generations mu_ = 4 # Population size lambda_ = 4 # Number of offspring # Option to use random search instead of crossover and mutation to create new individuals use_random_search = False population_initialization_factor = 4 # Multiply mu_ by this factor to set the initial population size crossover_probability = 0.9 mutation_probability = 1.0 - crossover_probability node_replacement_probability = 0.1 # Probability to perform mutation by altering a single node in the tree evaluation_samples = 3 # Number of evaluation samples Before we can run the evolutionary program synthesis based on the parameters specified, we need to discuss two additional options. First of all, EvoStencils allows to increase the problem size during the optimizaton dynamically. For this purpose, the parameter ``generalization_interval`` can be set, which specifies the number of generations after which both :math:`l_{min}` and :math:`l_{max}` are incremented. Note that this triggers a reevaluation of all individuals in the current population. If the user does not want to apply this feature the generalization interval can simply be set to a value greater than the total number of generations. As a second feature, EvoStencils allows to continue the evolutionary program synthesis from a saved checkpoint. Note that this requires the total number of generations to be higher than the generation in which the checkpoint was created. Furthermore, the symbols that occur within the grammar-based representation of each solver contained in the checkpointed population must match with the ones of the new program synthesis run. Since at this point no additional checks are performed, the user is advised to enable this feature with caution and to be prepared to deal with unexpected errors. .. code:: ipython3 # Number of generations after which a generalization is performed # This is achieved by incrementing min_level and max_level within the optimization # Such that a larger (and potentially more difficult) instance of the same problem is considered in subsequent generations generalization_interval = 50 # Option to continue from the checkpoint of a previous optimization # Warning: So far no check is performed whether the checkpoint is compatible with the current optimization setting continue_from_checkpoint = False Running the Evolutionary Program Synthesis ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After setting all necessary parameters, we can execute the evolutionary program synthesis. For this purpose, the ``Optimizer`` class provides the method ``evolutionary_optimization``, which can be executed by providing the parameters defined above as its arguments. After the execution of this method is finished, its outcome is returned in the form of five Python objects. Here, a grammatical representation of the multigrid method that leads to the fastest solving time for the largest problem considered during the optimization is stored in the first return value. Accordingly, the second value contains the corresponding DSL code of this method. The remaining three return values store information gathered during the optimization in the form of the final population, a `statistics `__ data structure and a `hall-of-fame `__ of the best multigrid methods according to the optimization objective(s). Note that in case the evolutionary program synthesis is performed on the full range of levels, each of these objects is a list containing only a single entry. Otherwise, the list contains an object for each respective sub-run. .. code:: ipython3 # Return values of the optimization # program: Grammar string representing the multigrid method on the topmost levels # dsl_code: ExaSlang program string representing the multigrid solver functions # pops: Populations at the end of each optimization run on the respective subrange of the discretization hierarchy # stats: Statistics structure (data structure provided by the DEAP framework) # hofs: Hall-of-fames at the end of each optimization run on the respective subrange of the discretization hierarchy program, dsl_code, pops, stats, hofs = optimizer.evolutionary_optimization(optimization_method=optimization_method, use_random_search=use_random_search, mu_=mu_, lambda_=lambda_, population_initialization_factor=population_initialization_factor, generations=generations, generalization_interval=generalization_interval, crossover_probability=crossover_probability, mutation_probability=mutation_probability, node_replacement_probability=node_replacement_probability, levels_per_run=levels_per_run, evaluation_samples=evaluation_samples, model_based_estimation=model_based_estimation, pde_parameter_values=pde_parameter_values, continue_from_checkpoint=continue_from_checkpoint) Storing and Examining the Results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After the evolutionary program synthesis has finished, we can examine the results, for instance by printing the DSL code of the best method discovered during the optimization. Furthermore, we can store both the final population and the hall-of-fame in a file using the ``dump_data_structure`` method of the Optimizer class. .. code:: ipython3 # Print the outcome of the optimization and store the data and statistics if mpi_rank == 0: print(f'\nExaSlang Code:\n{dsl_code}\n', flush=True) if not os.path.exists(f'./{problem_name}'): os.makedirs(f'./{problem_name}') j = 0 log_dir_name = f'./{problem_name}/data_{j}' while os.path.exists(log_dir_name): j += 1 log_dir_name = f'./{problem_name}/data_{j}' os.makedirs(log_dir_name) for i, log in enumerate(stats): optimizer.dump_data_structure(log, f"{log_dir_name}/log_{i}.p") for i, pop in enumerate(pops): optimizer.dump_data_structure(pop, f"{log_dir_name}/pop_{i}.p") for i, hof in enumerate(hofs): hof_dir = f'{log_dir_name}/hof_{i}' os.makedirs(hof_dir) for j, ind in enumerate(hof): with open(f'{hof_dir}/individual_{j}.txt', 'w') as grammar_file: grammar_file.write(str(ind) + '\n') Conclusion ~~~~~~~~~~ We hope that this tutorial provides a reasonable introduction to EvoStencils’ capabilities. Note that the functionality offered by the ``optimize.py`` script (located in the ``scripts`` subfolder) is similar to this notebook.