.. include:: ../../global.rst

3D FV Navier Stokes Example
==============================

In the scope of computational fluid dynamics (CFD), the Navier Stokes (NS) equations are of great importance
as they can be used to model the behavior of fluids or fluid-like substances,
assuming knowledge of its initial state and boundary conditions. 
Typical applications are the water flow in a pipe, ocean currents or even the air flow around a car or wing.
In this tutorial, we take a look at the incompressible NS equations which describe the dynamic flow of incompressible fluid, 
with the unknowns being the pressure and velocity as functions of space and time variables.
In this example, we only cover Newtonian fluids. 

Note that this tutorial covers more advanced topics than others.
Hence, it is strongly recommended to follow the reading order which can be found `here <index.html>`__.
For the sake of compactness, only the most relevant parts of the implementation are covered.
However, the complete program specification can be found in the files listed in the `code generation <#generation>`_ chapter.

Introduction
-----------------------

In this tutorial, we discretize the NS equation using the Finite Volumes (FV) method. 
Here, we use a non-linear Multigrid method, also known as Full Approximation Scheme (FAS).

Finite Volumes
~~~~~~~~~~~~~~~~~~~~

While it would be also possible to discretize the NS equations using Finite Differences (FD),
this example's focus lies on a FV discretization.
In contrast to FD, one does not discretize the operators but considers the whole equation at once.
Moreover, the grid data is located at the cell centers where data is assumed to be equal across one cell.

Independent from the NS equations, we first consider the following 2D problem to introduce the FV method:

.. math::

	\frac{\delta}{\delta x} u + \frac{\delta}{\delta y} u = \nabla \cdot u = f

Here, :math:`u` is the unknown and :math:`f` the right-hand side.
After integrating over the whole physical domain, we get the following:

.. math:: 

	\iint_{\Omega} \nabla \cdot u \: dx dy = \iint_{\Omega} f \: dx dy

By splitting the physical domain into multiple cells, 
we can now solve the equation for the computational domain,
i.e. by solving the integral for each cell, which will be also denoted as control volumes :math:`\Omega_{i,j} \in \Omega`.

.. math:: 

	\iint_{\Omega_{i,j}} \nabla \cdot u \: dx dy = \iint_{\Omega} f \: dx dy

By applying the divergence theorem, the integral on the left-hand side
is now replaced by a surface integral

.. math::

	\oint_{\delta \Omega_{i,j}} u \cdot \hat{\mathbf{n}} \: dx dy

with :math:`\hat{\mathbf{n}}` being the surface normal.
For axis-aligned uniform grids with step sizes :math:`\Delta x` and :math:`\Delta y`,
this can be simplified to: 

.. math::

	\Delta y \: u_{i+1/2,j} - \Delta y \: u_{i-1/2, j} + \Delta x \: u_{i, j+1/2} - \Delta x \: u_{i, j-1/2} = \Delta x \Delta y \: f_{i,j}

with half indices indicating an evaluation of quantities at the cell interfaces.
There are multiple approaches to compute the values at the interfaces.
The most simple one is merely building the mean value of two neighboring cells.

For second order derivatives, for instance the Poisson's equation :math:`-\Delta u = - \nabla^2 u = f`,
the integration over the control volumes looks as follows:

.. math::

	-\iint_{\Omega_{i,j}} \nabla^2 u \: dx dy = \iint_{\Omega_{i,j}} f \: dx dy

After applying Green's identities, this is simplified to:

.. math::

	-\oint_{\delta \Omega_{i,j}} \nabla u \cdot \hat{\mathbf{n}} \: dx dy = \iint_{\Omega_{i,j}} f \: dx dy

So, instead of evaluating the quantities at the cell interfaces we evaluate their derivatives.
These can be obtained using the FD method. For an axis-aligned uniform grid, we get:

.. math::

	-\Delta y \frac{u_{i+1, j} - u_{i,j}}{\Delta x} + \Delta y \frac{u_{i, j} - u_{i-1,j}}{\Delta x}
	-\Delta x \frac{u_{i, j+1} - u_{i,j}}{\Delta y} + \Delta x \frac{u_{i, j} - u_{i,j-1}}{\Delta y} = \Delta x \Delta y \: f_{i,j}

Problem Description
~~~~~~~~~~~~~~~~~~~~

With the theoretical background in the FV method in mind we can begin discretizing the incompressible Navier-Stokes equations.
At first we take a look at the continuous formulation of the problem on domain :math:`\Omega \in [0, \frac{1}{10}]^3`:

.. math::

	\rho \Big( \frac{\delta \mathbf{u}}{\delta t} + (\mathbf{u} \cdot \nabla) \mathbf{u} \Big) - \mu \nabla^2 \mathbf{u} + \nabla p = \tilde{\mathbf{f}}

	\nabla \cdot \mathbf{u} = 0

With :math:`\mathbf{u}` being the unknown and :math:`p` the pressure. 
Moreover, :math:`\rho` denotes the density and :math:`\mu` the dynamic viscosity.
The second equation is known as the continuity equation.
For the kinematic viscosity :math:`\nu = \frac{\mu}{\rho}` one can reformulate the upper equation
by dividing by :math:`\rho`:

.. math::

	\frac{\delta \mathbf{u}}{\delta t} + (\mathbf{u} \cdot \nabla) \mathbf{u} - \nu \nabla^2 \mathbf{u} + \frac{1}{\rho} \nabla p = \mathbf{f}

	\nabla \cdot \mathbf{u} = 0

Here, our right-hand side is the gravity :math:`\mathbf{g}`. Reformulated as scalar equations, this yields:

.. math::

	\frac{\delta u}{\delta t} + u \frac{\delta u}{\delta x} + v \frac{\delta u}{\delta y} + w \frac{\delta u}{\delta z}
		- \nu \Big( \frac{\delta^2 u}{\delta x^2} + \frac{\delta^2 u}{\delta y^2} + \frac{\delta^2 u}{\delta z^2} \Big) + \frac{1}{\rho} \frac{\delta p}{\delta x} = f_u

	\frac{\delta v}{\delta t} + u \frac{\delta v}{\delta x} + v \frac{\delta v}{\delta y} + w \frac{\delta v}{\delta z}
		- \nu \Big( \frac{\delta^2 v}{\delta x^2} + \frac{\delta^2 v}{\delta y^2} + \frac{\delta^2 v}{\delta z^2} \Big) + \frac{1}{\rho} \frac{\delta p}{\delta y} = f_v

	\frac{\delta w}{\delta t} + u \frac{\delta w}{\delta x} + v \frac{\delta w}{\delta y} + w \frac{\delta w}{\delta z}
		- \nu \Big( \frac{\delta^2 w}{\delta x^2} + \frac{\delta^2 w}{\delta y^2} + \frac{\delta^2 w}{\delta z^2} \Big) + \frac{1}{\rho} \frac{\delta p}{\delta z} = f_w


	\frac{\delta}{\delta x} u +\frac{\delta}{\delta y} v + \frac{\delta}{\delta z} w = 0


Staggered Grids
~~~~~~~~~~~~~~~~~~~~

Locating all quantities, e.g. the velocity and pressure, on the same grid location can lead to stability issues.
One prominent example is the odd-even decoupling of pressure and velocity which may lead to 
`checkerboard patterns <http://www.thevisualroom.com/marker_and_cell_method.html#demonstration-of-odd-even-decoupling>`_ in the solutions.
Therefore, as a countermeasure we employ so-called staggered grids. A 2D example is depicted below.
On the left side, the different variable localizations are shown and
on the right side, staggered control volumes are depicted.

.. container:: algorithmic

	|stag| |cv|

	.. |stag| image:: image/staggered.PNG
		:width: 52%

	.. |cv| image:: image/control_volumes.svg
		:width: 28%

	.. line-block::

		Fig. 1: 2D staggered grid.

Note that besides regular node- or cell-centered quantities we now also have face-centered ones. 
These are effectively used to implement staggered grids.
Face-centered variables have a face dimension which equals the stagger dimension,
i.e. the dimension in which cell-centered variables need to be shifted to concur with the cell interfaces.
In ExaStencils, this is implemented such that in all dimensions except the stagger dimension
the variables are treated as cell-centered. In the stagger dimension they are treated as node-centered.

Integrate Functions
~~~~~~~~~~~~~~~~~~~~

As already mentioned, we use the FV method to discretize our continuous terms.
Since integrations are ubiquitous in FV discretizations, 
e.g. for the integration over a specific boundary of a (potentially staggered) control volume,
ExaStencils provides special constructs for these operations, namely integrate functions.
Examples for them are shown below:

.. code-block:: none
	:caption: Integrate Functions in ExaSlang
	:linenos:

	// integrate over interfaces of regular grid cells
	integrateOverEastFace ( expr )

	// integrate over interfaces of y-staggered cells
	integrateOverYStaggeredWestFace ( u )

	// integrate function call with offset
	integrateOverSouthFace@[0, 1] ( p ) 

Their function name always begins with ``integrateOver`` followed by an optional stagger dimension (``XStaggered``, ``YStaggered`` or ``ZStaggered``)
and a target interface (``EastFace``, ``NorthFace``, ``TopFace``, ...).
Moreover, these functions not only take constants, variables or field accesses as argument, but also allow more complex expressions to be passed.
Additionally, as shown in line 8 of the above code snippet, offsets for the function call can be specified.

Discretization
~~~~~~~~~~~~~~~~~~~~

For a better overview, this section has been split into the discretization of the two equations, namely the NS and the continuity equation.

.. math::

	\iiint_{\Omega} \frac{\delta \mathbf{u}}{\delta t} + (\mathbf{u} \cdot \nabla) \mathbf{u} - \nu \nabla^2 \mathbf{u} + \frac{1}{\rho} \nabla p \: dx dy dz 
		= \iiint_{\Omega} \mathbf{f} \: dx dy dz

	\iiint_\Omega \nabla \cdot \mathbf{u} \: dx dy dz = 0

We begin with the continuity equation since the necessary steps are rather simple.
Here, we again apply the divergence theorem and integrate over the non-staggered control volume :math:`\Omega_{i,j,k}`:

.. math::

	\oint_{\delta \Omega_{i,j,k}} \mathbf{u} \cdot \hat{\mathbf{n}} \: dx dy dz = 0

The discretization of the NS equation is more complex.
Thus, we first break the left-hand side of the equation down into four parts:

#. The time derivative: :math:`\displaystyle{ \frac{\delta \mathbf{u}}{\delta t} }`.
#. The non-linear convective term: :math:`\displaystyle{ \Big( \mathbf{u} \cdot \nabla \Big) \mathbf{u} }`.
#. The diffusive term: :math:`\displaystyle{ \nu \nabla^2 \mathbf{u} }`.
#. The pressure coupling: :math:`\displaystyle{ \frac{1}{\rho} \nabla p }`.

In the following, the discretization of each part will be viewed separately
and will be plugged together in the end again.

Time Derivative
^^^^^^^^^^^^^^^^^^ 

We first discretize the first-order time derivative:

.. math::

	\iiint_{\Omega} \frac{\delta \mathbf{u}}{\delta t} \: dx dy dz =
		\frac{\delta \mathbf{u}}{\delta t} \iiint_{\Omega} 1 \: dx dy dz

For the approximation of the time derivative, we use a straightforward backwards difference, i.e.

.. math::

	\frac{\delta \mathbf{u}^{n+1}}{\delta t} \approx \frac{\mathbf{u}^{n+1} - \mathbf{u}^n}{\Delta t}

where :math:`n` denotes the current time step and :math:`\Delta t` the time step size. 


Diffusion 
^^^^^^^^^^^^^

For the diffusion, we simply apply Green's identities again:

.. math::

	- \iiint_{\Omega} \nu \nabla^2 \mathbf{u} \: dx dy dz = -\oint_{\delta \Omega} \nu \nabla \mathbf{u} \cdot \hat{\mathbf{n}} \: dx dy dz

Formulated as integrals over the control volumes and split up into scalar equations, this yields:

.. math::

	-\oint_{\delta \Omega_{i + 1/2, j, k}} \frac{\delta}{\delta x} u \cdot \hat{\mathbf{n}} \: dx dy dz

	-\oint_{\delta \Omega_{i, j + 1/2, k}} \frac{\delta}{\delta y} v \cdot \hat{\mathbf{n}} \: dx dy dz

	-\oint_{\delta \Omega_{i, j, k + 1/2}} \frac{\delta}{\delta z} w \cdot \hat{\mathbf{n}} \: dx dy dz

with :math:`\Omega_{i,j,k}` being the non-staggered volume for the cell :math:`(i, j, k)` and
:math:`\Omega_{i + 1/2, j, k}`, :math:`\Omega_{i, j + 1/2, k}` and :math:`\Omega_{i, j, k + 1/2}`
being the staggered control volumes in x-, y- and z-direction. 
As mentioned in the FV chapter, we evaluate the derivatives using FD.


Convection 
^^^^^^^^^^^^^
	
Afterwards, we inspect the non-linear convective term and simplify it: 

.. math::

	\iiint_\Omega (\mathbf{u} \cdot \nabla) \mathbf{u} \: dx dy dz
	= \oint_{\delta \Omega} \mathbf{u} ( \mathbf{u} \cdot \hat{\mathbf{n}} ) \: dx dy dz

And rewritten as scalar equations:

.. math::

	\oint_{\delta \Omega_{i + 1/2, j, k}} u ( \mathbf{u} \cdot \hat{\mathbf{n}} ) \: dx dy dz

	\oint_{\delta \Omega_{i, j + 1/2, k}} v ( \mathbf{u} \cdot \hat{\mathbf{n}} ) \: dx dy dz

	\oint_{\delta \Omega_{i, j, k + 1/2}} w ( \mathbf{u} \cdot \hat{\mathbf{n}} ) \: dx dy dz


Since some algorithmic components, e.g. a (linear) smoother in the solver, require a linearized version 
of these non-linear terms, linearization techniques must be applied.
In this tutorial the focus will lie on the Picard linearization, also known as frozen or lagged coefficients.
Generally, the linearization can be done locally, i.e. for a single cell or a small set of cells,
or globally, i.e. for the whole system of equations.
This tutorial is structured such that the local variant is the focal point,
but some remarks to the global variant can be found in the `global linearization excursion <#excursion-global-linearization>`_.

As the name frozen coefficients implies, 
the gist is to freeze parts of the non-linear term.
For the convective term this yields:

.. math::

	(\mathbf{u}^{k+1} \cdot \nabla) \mathbf{u}^{k+1} \approx (\mathbf{u}^{k} \cdot \nabla) \mathbf{u}^{k+1}

Here, the frozen parts are marked with the superscript :math:`k`, whereas the remaining parts are denoted by :math:`k + 1`.
In contrast to the frozen parts, the remaining ones can be solved or updated for.

In order to provide this knowledge to the code generator, 
ExaStencils introduced so-called frozen field access.
For instance, a frozen access to the :math:`\mathbf{u}` vector's y-component :math:`v` would look like this:

.. code-block::

	frozen ( v )

In regards to the sorting of terms into left- or right-hand side done during the generation process,
frozen field accesses are treated as constants.
Therefore they may come out as stencil coefficients
such that a locally linearized operator is expressed. 

Pressure Coupling 
^^^^^^^^^^^^^^^^^^^^

Lastly, applying the standard steps to the pressure coupling yields:

.. math::

	\iiint_\Omega \frac{1}{\rho} \nabla p \: dx dy dz
		= \oint_{\delta \Omega} \frac{1}{\rho} p \hat{\mathbf{n}} \: dx dy dz

As scalar equation:

.. math::

	\oint_{\delta \Omega_{i + 1/2, j, k}} \frac{1}{\rho} p \hat{\mathbf{n}} \: dx dy dz

	\oint_{\delta \Omega_{i, j + 1/2, k}} \frac{1}{\rho} p \hat{\mathbf{n}} \: dx dy dz
	
	\oint_{\delta \Omega_{i, j, k + 1/2}} \frac{1}{\rho} p \hat{\mathbf{n}} \: dx dy dz


Plugged Together
^^^^^^^^^^^^^^^^^^^^^

Now we plug the discretized parts back into the NS equation.
The equation for the vector's x-component :math:`u` is shown below. 
The equations for the y- and z-components :math:`v` and :math:`w` can be build analogously.

.. container:: algorithmic

	| :math:`\displaystyle{ \frac{u^{n+1} - u^n}{\Delta t} * \iiint_{\delta \Omega_{i + 1/2, j, k}} }` ``time derivative``
	| 	:math:`\displaystyle{ + \oint_{\delta \Omega_{i + 1/2, j, k}} u ( \mathbf{u} \cdot \hat{\mathbf{n}} ) \: dx dy dz }` ``convection`` 
	| 	:math:`\displaystyle{ - \oint_{\delta \Omega_{i + 1/2, j, k}} \frac{\delta}{\delta x} u \cdot \hat{\mathbf{n}} \: dx dy dz }` ``diffusion``
	| 	:math:`\displaystyle{ + \oint_{\delta \Omega_{i + 1/2, j, k}} \frac{1}{\rho} p \hat{\mathbf{n}} \: dx dy dz }` ``pressure coupling``
	| :math:`\displaystyle{ = \iiint_{\delta \Omega_{i + 1/2, j, k}} f \: dx dy dz }` ``rhs`` 
		
To recapitulate, the continuity equation in vector form is :math:`\oint_{\delta \Omega_{i,j,k}} \mathbf{u} \cdot \hat{\mathbf{n}} \: dx dy dz = 0`.

All in all, for a Picard linearization we get a system of the form:

.. math:: 

	\begin{pmatrix} 
		A_{11} &        &        & B_{1} \\
		       & A_{22} &        & B_{2} \\
		       &        & A_{33} & B_{3} \\
		C_{1}  & C_{2}  & C_{3}  & 		 \\
	\end{pmatrix}
	\begin{pmatrix}
	u \\ v \\ w \\ p
	\end{pmatrix}
	=
	\begin{pmatrix}
	f_u \\ f_v \\ f_w \\ 0
	\end{pmatrix}


Program specification
-------------------------------

After composing the discretization of the NS equations, we begin with the implementation of our ExaSlang program. 
In contrast to the Poisson tutorial, we utilize multiple ExaSlang layers for the implementation of the application.

ExaSlang 2
~~~~~~~~~~~~~~~~~~~~

ExaSlang 2 is intended to be used to formulate the discretization of a problem.
The code snippet below demonstrates a ExaSlang 2 definition of the previously derived equations
using a local Picard linearization.
Again, for the sake of compactness, only the equation for the :math:`u` vector component is shown.

.. code-block:: none
	:caption: Definition of the discretized NS equations.

	Equation uEquation {
		// time derivative
		(u@[ 0,  0,  0] - uOld) * vf_xStagCellVolume / dt

		// diffusion
		+ (u@[ 0,  0,  0] - u@[-1,  0,  0]) * integrateOverXStaggeredWestFace   ( nue ) / vf_cellWidth_x  @[-1,  0,  0]
		+ (u@[ 0,  0,  0] - u@[ 1,  0,  0]) * integrateOverXStaggeredEastFace   ( nue ) / vf_cellWidth_x  @[ 0,  0,  0]
		+ (u@[ 0,  0,  0] - u@[ 0, -1,  0]) * integrateOverXStaggeredSouthFace  ( nue ) / vf_stagCVWidth_y@[ 0,  0,  0]
		+ (u@[ 0,  0,  0] - u@[ 0,  1,  0]) * integrateOverXStaggeredNorthFace  ( nue ) / vf_stagCVWidth_y@[ 0,  1,  0]
		+ (u@[ 0,  0,  0] - u@[ 0,  0, -1]) * integrateOverXStaggeredBottomFace ( nue ) / vf_stagCVWidth_z@[ 0,  0,  0]
		+ (u@[ 0,  0,  0] - u@[ 0,  0,  1]) * integrateOverXStaggeredTopFace    ( nue ) / vf_stagCVWidth_z@[ 0,  0,  1]

		// convection
		- integrateOverXStaggeredWestFace   ( u * frozen ( u ) )
		+ integrateOverXStaggeredEastFace   ( u * frozen ( u ) )
		- integrateOverXStaggeredSouthFace  ( u * frozen ( v ) )
		+ integrateOverXStaggeredNorthFace  ( u * frozen ( v ) )
		- integrateOverXStaggeredBottomFace ( u * frozen ( w ) )
		+ integrateOverXStaggeredTopFace    ( u * frozen ( w ) )

		// pressure coupling
		+ integrateOverXStaggeredEastFace ( p / rho )
		- integrateOverXStaggeredWestFace ( p / rho )

		== gravity_x * vf_xStagCellVolume
	}

	// vEquation & wEquation defined analogously to uEquation 
	// ...

	Equation pEquation {
		integrateOverEastFace     ( u )
		- integrateOverWestFace   ( u )
		+ integrateOverNorthFace  ( v )
		- integrateOverSouthFace  ( v )
		+ integrateOverTopFace    ( w )
		- integrateOverBottomFace ( w )
		== 0.0
	}

Another important language feature in Layer 2 is the extraction of 
operators from a discretized equation. 
Not only does it extract single stencil operators for the coupling of multiple unknowns,
it also hints the compiler to employ said stencils in the equations.
As a result, it improves the readability of lower layer codes that were generated from Layer 2. 
Moreover, in some cases it is desirable to compute the stencil coefficients once and store them
when utilized repeatedly. Another use case is when a global linearization is applied.
For our NS example, the operators can be extracted as in the following:

.. code-block:: none
	:caption: Extracting operators from equations.

	generate operators @all {
		equation for u is uEquation store in {
			u => A11
			p => B1
		}

		// vEquation & wEquation analogous to uEquation
		// ...

		equation for p is pEquation store in {
			u => C1
			v => C2
			w => C3
		}
	}


ExaSlang 3
~~~~~~~~~~~~~~~~~~~~

On layer 3, we implement our solver. 
In this example, we demonstrate the `generate` interface and generate a solver function with it.
Since we deal with a non-linear system, we chose the Full Approximation Scheme (FAS) as solver.

As also shown in the FAS algorithm below,
we do not only restrict the residual to a coarser right-hand side but also the solution.
With it, we can obtain an approximate solution on the coarser level and
can use it as initial guess to change the equation to be solved.
While a standard multigrid algorithm solves for the error equation :math:`A (e) = r`,
FAS does that for :math:`A (x) = A (\tilde{x}) + r` with :math:`x = e + \tilde{x}`.
In more simple words, typical MG tries to approach the error while FAS tries to find a better solution.
In case that :math:`A` is linear, however, both equations are equivalent. 

.. container:: algorithmic

	| **if** *coarsest level* **then**
	| 	solve :math:`A_hu_h=f_h` exactly or by many smoothing iterations
	| **else**
	| 	:math:`\bar{u}_h^{(k)} = \mathcal{S}_h^{\nu_1} \Big( u_h^{(k)}, A_h, f_h \Big)` ``pre-smoothing`` 
	| 	:math:`r_h = f_h - A_h \Big( \bar{u}_h^{(k)} \Big)` ``compute residual``
	| 	:math:`r_H = \mathcal{R} r_h` ``restrict residual``
	| 	:math:`\hat{u}_H = \mathcal{R}u_h` ``restrict solution``
	| 	:math:`f_H = A_H \Big( \hat{u}_H \Big) + r_H` ``setup coarser rhs`` 
	| 	:math:`u_H = V_H \Big(0, A_H, f_H, \nu_1, \nu_2\Big)` ``recursion``
	| 	:math:`e_H = u_H - \hat{u}_H` ``compute error`` 
	| 	:math:`e_h = \mathcal{P}e_H` ``prolongate error``
	| 	:math:`\tilde{u}_h^{(k)} = \bar{u}_h^{(k)} + e_h` ``coarse grid correction``
	| 	:math:`u_h^{(k+1)} = \mathcal{S}_h^{\nu_2}\Big( \tilde{u}_h^{(k)}, A_h, f_h \Big)` ``post-smoothing``
	| **end if**

	.. line-block::

		Fig. 2: FAS V-Cycle :math:`V_h\Big(u_h, A_h, f_h, \nu_1, \nu_2\Big)` for a non-linear matrix A

Another aspect from the scope of FV is choosing the right inter-grid operators.
Here, one must distinguish whether discretized functions or integrals are transferred between the grids.
While inter-grid operators for discretized functions try to preserve the values,
integrated values must be adjusted with respect to their integration area.

For instance, these can be compactly constructed using `default stencils <../../documentation/user/ExaSlang4.html#stencil-from-default>`_:

.. code-block:: none
	:caption: Restriction operator for discretized integral

	Stencil RestrictionCellIntegral@all from default restriction on Cell with 'integral_linear'

In ExaStencils, one can either fully implement an own solver or make use of a generated one.
Automatically generating one has the benefit of saving the time spent with implementing an own solver.
One only has to provide a very concise input,
namely a list of pairs where each pair consists of the unknown to be solved for and its corresponding equation.
Additionally, since often minor tweaks to a standard solver are desired, 
some knowledge parameters, especially for the domain of MG solvers, were introduced. 
This includes:

* An exit criterion based on a targeted residual reduction and the max. number of iterations
* Smoothing parameters such as the the number of smoothing steps, an relaxation factor and a coloring scheme
* The coarse grid solver, including its type and exit criterion

Moreover, some parts such as the smoother may be adapted as well. For this purpose,
solver stages such as the ``smoother``, ``updateResidual``, ``restriction`` and ``correction`` were introduced
and can be modified on all levels. The ``cgs`` is restricted to the coarsest level. It is also possible
to handle a MG ``cycle`` or the whole ``solver`` as a stage. 
All in all, for each stage statements may be prepended or appended to the body of the stage. 
As demonstrated below, replacing the whole stage, e.g. with an hand-written specialized version, is also an possibility.

.. code-block:: none
	:caption: Modifying the solver stages in a generated solver.

	generate solver for ... {
		...
	} modifiers {
		append to 'cycle' @finest {
			PrintError@finest ( )
		}

		replace 'restriction' @all {
			myRestriction@current ( )
		}
	}

Another kind of modification is to replace the auxiliary fields created during the solver generation
with already existing fields in their stead:

.. code-block:: none
	:caption: Replace generated field in solver generation.

	generate solver for ... {
		...
	} modifiers {
		replace 'gen_field' @all with myField
	}

For our Navier-Stokes example, the FAS solver can be generated as follows:

.. container:: toggle

	.. container:: header
			
		**Show/Hide ExaSlang3 FAS Generation Code**
		
	.. code-block:: none
		:caption: ExaSlang3 FAS Solver Generation.

		generate solver for u in uEquation and v in vEquation and w in wEquation and p in pEquation with {
			solver_maxNumIts              = 20
			solver_targetResReduction     = 1e-30
			solver_absResThreshold        = 1e-10

			solver_smoother_jacobiType    = false
			solver_smoother_numPre        = 3
			solver_smoother_numPost       = 3
			solver_smoother_damping       = 1
			solver_smoother_coloring      = "red-black" // "9-way"

			solver_cgs                    = "BiCGStab"
			solver_cgs_maxNumIts          = 12800
			solver_cgs_targetResReduction = 1e-3
			solver_cgs_absResThreshold    = 1e-12
			solver_cgs_restart            = true
			solver_cgs_restartAfter       = 128

			solver_useFAS                 = true

			solver_silent                 = true

		} modifiers {

			prepend to 'cycle' @finest {
				startTimer ( 'cycle' )
				totalNumCycles += 1
			}
			append to 'cycle' @finest {
				stopTimer ( 'cycle' )
			}
			append to 'solver' @finest {
				if ( gen_curRes != gen_curRes || gen_curRes > 1.0E-10 ) {
					curTime -= dt
					dt /= 2
					curTime += dt
					print ( "Error detected after", gen_curIt, "steps, residual is", gen_curRes, "- reducing time step size to", dt, "and trying again with adapted time", curTime )

					u = uOld
					v = vOld
					w = wOld
					p = pOld

					gen_solve ( )
				} else {
					print ( "  Residual at", curTime, "after", gen_curIt, "iterations is", gen_curRes, ", was initially" , gen_initRes )
				}
			}
		}

As can be seen, a coarse grid solver is generated as well.
Here, we use a biconjugate gradient stabilized (BiCGSTAB) solver.
While other coarse grid solvers such as CG, which is only suitable
when the system matrix :math:`A` is symmetric and positive-definite,
are restricted to certain scenarios, BiCGSTAB can even be used when
:math:`A` is not symmetric. 
For the sake of compactness, the algorithm for BiCGSTAB was left out.
For more details, please refer to :cite:`BiCGSTAB`.

Despite the fact that the generate interface facilitates the composition of a solver,
requires only a very concise input and even provides a modification mechanism
to further adapt the generated standard solvers,
generating solvers will never be as flexible as when
self-written solvers are employed.

The following code block demonstrates a hand-written FAS solver for the NS equations.
The corresponding steps of the algorithm are highlighted in form of comments.
Again, for the sake of compactness, the coarse grid solver was omitted.

.. container:: toggle

	.. container:: header
			
		**Show/Hide ExaSlang3 FAS solver implementation**

	.. code-block:: none
		:caption: FAS solver implementation for the NS equations.

		Function Cycle@(all but coarsest) {
			// pre-smoothing
			Smoother ( )

			// residual computation
			residual_u = rhs_u - ( A11 * u + B1 * p )
			/* residual_v, residual_w, residual_p ... */

			// solution restriction
			approx_u@coarser = RestrictionFaceX * u
			u@coarser = approx_u@coarser
			/* approx_v, v, approx_w, w, approx_p, p ... */

			// residual restriction and coarser rhs setup
			rhs_u@coarser = (
				RestrictionFaceXIntegral * residual_u
				+ A11@coarser * approx_u@coarser
				+ B1 @coarser * approx_p@coarser )
			/* rhs_v, rhs_w, rhs_p ... */

			// recursion
			Cycle@coarser ( )

			// error computation, prolongation and correction
			u@coarser -= approx_u@coarser
			u += CorrectionFaceX * u@coarser
			/* v, w, p ... */

			// post-smoothing
			Smoother ( )
		}


ExaSlang 4
~~~~~~~~~~~~~~~~~~~~

On layer 4, we only declare the ``Application`` function.
Similar to the previous example, we first initialize
the globals and fields.
Subsequently, we call our generated solver ``gen_solve``
until the max. time in the simulation was reached.
For the time stepping scheme we update the old field values to be the results of the previous time step
before the actual call to the solver.
Once the solver terminates, timings and other simulation results are output
and the allocated memory is freed.

.. code-block:: none
	:caption: Application function declaration.

	Function Application ( ) : Unit {
		startTimer ( 'setup' )

		// initialization
		initGlobals ( )
		initDomain ( )
		initFieldsWithZero ( )
		initGeometry ( )
		InitFields ( )

		stopTimer ( 'setup' )

		print ( 'Reynolds number:', Re )

		// solve
		Var curIt : Int = 0
		repeat until ( curTime >= maxTime ) {

			if ( 0 == curIt % 100 ) {
				print ( "Starting iteration", curIt, "at time", curTime, "with dt", dt )
			}
			if ( 0 == curIt % 16 and curIt > 0 ) {
				dt *= 2
				print ( "Trying to increase dt to", dt )
			}

			curTime += dt
			curIt += 1

			Advance@finest ( )
			startTimer ( 'solve' )
			gen_solve@finest ( )
			stopTimer ( 'solve' )
			totalNumTimeSteps += 1
		}

		// print timers, number of timesteps, etc. 
		// ...

		// de-initialization
		destroyGlobals ( )
	}


Generation
-------------------------------


The files that are necessary for the generation of C++ code from this tutorial's ExaSlang specification are:

* `./NavierStokes/3D_FV_NavierStokes_localPicard.exa2 <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_localPicard.exa2>`_
* `./NavierStokes/3D_FV_NavierStokes_localPicard.exa3 <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_localPicard.exa3>`_
* `./NavierStokes/3D_FV_NavierStokes_localPicard.exa4 <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_localPicard.exa4>`_
* `./NavierStokes/3D_FV_NavierStokes_localPicard.knowledge <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_localPicard.knowledge>`_
* `./NavierStokes/3D_FV_NavierStokes_localPicard.settings <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_localPicard.settings>`_
* `./lib/linux.platform <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/lib/linux.platform>`_
* `./lib/domain_onePatch.knowledge <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/lib/domain_onePatch.knowledge>`_
* `./lib/parallelization_pureOmp.knowledge <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/lib/parallelization_pureOmp.knowledge>`_
* `./Compiler.jar`: Instructions on how the compiler is built can be found `here <https://i10git.cs.fau.de/exastencils/release/-/blob/master/README.md>`__

Starting the generation process can be done as in the `Poisson example <2D_FD_Poisson_fromL4.html#code-generation>`_.
	

Extensions
-----------------------

In this section, we discuss how the current implementation can be extended 
with features such as a VTK visualization.

Visualization
~~~~~~~~~~~~~~~~~~~~

In order to output simulation results in the (legacy) VTK format,
one can make use of the ``printVtk`` statements.
These are tailored towards certain type of application
and output a set of pre-defined fields on an unstructured grid.
Currently, following ``printVtk`` statements exist:

* ``printVtkNS``: Navier-Stokes on an quadrilateral/hexahedral mesh
* ``printVtkNNF``: Navier-Stokes for non-Newtonian fluids on an quadrilateral/hexahedral mesh
* ``printVtkSWE``: Shallow-Water equations on an triangular mesh

In order to output a ``.vtk`` file for each simulation step,
one can use the ``buildString`` statement to concatenate
several expressions to build a string. 

.. code-block:: none
	:caption: Enable VTK visualization in ExaSlang4

	Function Application ( ) : Unit {
		// ...

		Var curPrintIt : Int = 0
		repeat until ( curTime >= maxTime ) {
			if ( curTime >= nextPrint ) {
				nextPrint += printInterval
				Var filename_vel : String
				buildString ( filename_vel, "data/output_", curPrintIt, ".vtk" )
				printVtkNS ( filename_vel, levels@finest ( ) )
				curPrintIt += 1
			}

			// ...
		}

		// ...
	}

Once the program has terminated, the ``.vtk`` files in the target code's ``./data`` directory can be visualized
with general-purpose tools such as ParaView and VisIt. 

In this simulation, we configured ``maxTime`` to be 1000, set the ``printInterval`` to be 2
and set the initial value of ``nextPrint`` to 0. 
Another important parameter in the scope of CFD is the Reynolds number,
as it is used to characterize different test cases.
In our simulation, the Reynolds number :math:`Re` is 1120.28.

One exemplary visualization of the simulation results is shown below.
Here, the domain was clipped in half in direction of the y-normal via the ``Clip`` filter,
a streamline plot was employed with the ``StreamTracer`` filter 
and afterwards the streamlines were transformed to streamtubes using the ``Tube`` filter.
Moreover, note that for the streamlines the `SeedType` option was set to `Point Cloud`. 

.. raw:: html

	<video class="hide-in-ie8" width="720" height="420" autoplay muted controls loop>
		<source src="../../_static/_videos/3d_NS_tubes.ogv" type="video/ogg" /> 
	Your browser does not support the video tag.
	</video>

Additionally, a surface plot with the domain clipped as before is shown.

.. raw:: html

	<video class="hide-in-ie8" width="720" height="420" autoplay muted controls loop>
		<source src="../../_static/_videos/3d_NS_surface.ogv" type="video/ogg" /> 
	Your browser does not support the video tag.
	</video>


Excursion: Global linearization
-----------------------------------

Until now, we only performed the Picard linearization for a single cell.
In this excursion, the implementation steps for the global variant, i.e. for the whole system of equations, are shown.

ExaSlang 2
~~~~~~~~~~~~~~~~~~~~

Since our stencil coefficients now depend on the velocity field,
which is updated over the course of the simulation, 
we utilize a so-called stencil template or stencil field.
This field is conceptually split into a coefficient field
where the variable coefficients are stored and a stencil consisting of
coefficients being the accesses to this field.

We first declare such an operator as shown in the following code snippet.
Here, a domain and localization must be specified to be able to 
set up the coefficients later. Updating the stencil coefficients 
must then be set up on layer 3 or 4. In this tutorial
we declare an ``AssembleStencil`` function on layer 4 for this purpose. 

.. code-block:: none
	:caption: Stencil template declaration for the :math:`A_{11}` operator
	
	Operator A11 from StencilTemplate on Face_x of global {
		[ 0,  0,  0] =>
		[-1,  0,  0] =>
		[ 1,  0,  0] =>
		[ 0, -1,  0] =>
		[ 0,  1,  0] =>
		[ 0,  0, -1] =>
		[ 0,  0,  1] =>
	}

	// A_22 and A_33 declared analogously ...

Subsequently, we set up the equations to use our declared operators: 

.. code-block:: none
	:caption: Equation definition for a global Picard linearization

	Equation uEquation {
  		A11 * u

		// pressure coupling
		+ integrateOverXStaggeredEastFace   ( p )
		- integrateOverXStaggeredWestFace   ( p )

  		== rhs_u
	}

	// vEquation and wEquation analogous to uEquation

	Equation pEquation {
		integrateOverEastFace   ( u )
		- integrateOverWestFace   ( u )
		+ integrateOverNorthFace  ( v )
		- integrateOverSouthFace  ( v )
		+ integrateOverTopFace    ( w )
		- integrateOverBottomFace ( w )
		== 0.0
	}

Furthermore, the operators for the pressure coupling in the velocity equations
and operators of the pressure equation can be extracted as in the following code block.
These operators can then be used for the implementation of the solver in layer 3.
Please note that in multiple algorithmic components of the solver 
the ``AssembleStencil`` function to set up the stencil weights for the global linearization must be invoked.

.. code-block:: none
	:caption: Modified extraction of operators from equations

	generate operators @all {
		equation for u is uEquation store in {
			p => B1
		}

		// vEquation & wEquation analogous to uEquation
		// ...

		equation for p is pEquation store in {
			u => C1
			v => C2
			w => C3
		}
	}

ExaSlang 4
~~~~~~~~~~~~~~~~~~~~

On layer 4, we define our ``AssembleStencil`` function,
where we iterate through the stencil field and update its coefficients.
For our NS example, we define the function as follows:

.. code-block:: none
	:caption: AssembleStencil function for a global Picard linearization

	Function AssembleStencil@all {

		// potential communication
		// ...

		loop over A11 {
			// diffusion
			A11:[-1,  0,  0] = - integrateOverXStaggeredWestFace   ( mue ) / vf_cellWidth_x@[-1, 0, 0]
			A11:[ 1,  0,  0] = - integrateOverXStaggeredEastFace   ( mue ) / vf_cellWidth_x@[ 0, 0, 0]
			A11:[ 0, -1,  0] = - integrateOverXStaggeredSouthFace  ( mue ) / vf_stagCVWidth_y@[0, 0, 0]
			A11:[ 0,  1,  0] = - integrateOverXStaggeredNorthFace  ( mue ) / vf_stagCVWidth_y@[0, 1, 0]
			A11:[ 0,  0, -1] = - integrateOverXStaggeredBottomFace ( mue ) / vf_stagCVWidth_z@[0, 0, 0]
			A11:[ 0,  0,  1] = - integrateOverXStaggeredTopFace    ( mue ) / vf_stagCVWidth_z@[0, 0, 1]
			A11:[ 0,  0,  0] = - ( A11:[-1, 0, 0] + A11:[1, 0, 0] + A11:[0, -1, 0] + A11:[0, 1, 0] + A11:[0, 0, -1] + A11:[0, 0, 1] )

			// convection
			A11:[-1,  0,  0] -= 0.5 * integrateOverXStaggeredWestFace   ( rho * u )
			A11:[ 0,  0,  0] -= 0.5 * integrateOverXStaggeredWestFace   ( rho * u )
			A11:[ 1,  0,  0] += 0.5 * integrateOverXStaggeredEastFace   ( rho * u )
			A11:[ 0,  0,  0] += 0.5 * integrateOverXStaggeredEastFace   ( rho * u )

			A11:[ 0, -1,  0] -= 0.5 * integrateOverXStaggeredSouthFace  ( rho * v )
			A11:[ 0,  0,  0] -= 0.5 * integrateOverXStaggeredSouthFace  ( rho * v )
			A11:[ 0,  1,  0] += 0.5 * integrateOverXStaggeredNorthFace  ( rho * v )
			A11:[ 0,  0,  0] += 0.5 * integrateOverXStaggeredNorthFace  ( rho * v )

			A11:[ 0,  0, -1] -= 0.5 * integrateOverXStaggeredBottomFace ( rho * w )
			A11:[ 0,  0,  0] -= 0.5 * integrateOverXStaggeredBottomFace ( rho * w )
			A11:[ 0,  0,  1] += 0.5 * integrateOverXStaggeredTopFace    ( rho * w )
			A11:[ 0,  0,  0] += 0.5 * integrateOverXStaggeredTopFace    ( rho * w )

			// time derivative
			A11:[ 0,  0,  0] += evalAtWestFace ( rho ) * vf_xStagCellVolume / dt
		}

		// A22 and A33 analogous to A11 ...
	}

Sources
~~~~~~~~~~~~~~~~~~~~

The sources for the generation of a NS simulation using a global Picard linearization are:

* `./NavierStokes/3D_FV_NavierStokes_Picard.exa2 <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_Picard.exa2>`_
* `./NavierStokes/3D_FV_NavierStokes_Picard.exa3 <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_Picard.exa3>`_
* `./NavierStokes/3D_FV_NavierStokes_Picard.exa4 <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_Picard.exa4>`_
* `./NavierStokes/3D_FV_NavierStokes_Picard.knowledge <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_Picard.knowledge>`_
* `./NavierStokes/3D_FV_NavierStokes_Picard.settings <https://i10git.cs.fau.de/exastencils/release/-/raw/master/Examples/NavierStokes/3D_FV_NavierStokes_Picard.settings>`_

The remaining files required for the generation are identical. 
The same applies to the necessary steps for the generation.