.. include:: ../../global.rst

Tide-driven flow at Bight of Abaco (Bahamas)
============================================

In this example, we examine the simulation of a tide-driven using a quadrature-free discontinuous Galerkin (DG) discretization of the 2D shallow water equations (SWE).

Problem description
-------------------

We consider the 2D shallow water equations in conservative form
defined on some 2D domain :math:`\Omega`.

.. math::
	\begin{aligned}
	&\partial_t \xi+\nabla \cdot \boldsymbol{q}=0\\
	&\partial_t \boldsymbol{q}
	+\nabla \cdot \left(\boldsymbol{qq}^T /H \right)
	+\tau_{\textrm{bf}} \boldsymbol{q}
	+\left( \begin{smallmatrix} 0 & -f_c\\ f_c & 0 \end{smallmatrix} \right) \boldsymbol{q}
	+gH\nabla\xi
	=\boldsymbol{F}
	\end{aligned}
| :math:`\xi` represents the elevation of the free surface with respect to some datum, e.g., the mean sea level. 
| :math:`H = h_b + \xi` denotes the total water depth with the bathymetry :math:`h_b`.
| :math:`\boldsymbol{q} := (U,V)^T` is the depth integrated horizontal velocity.
| Furthermore, :math:`f_c` is the Coriolis coefficient, :math:`g` the gravitational acceleration and :math:`\tau_{\textrm{bf}}` the bottom friction coefficient.
| Finally, :math:`\boldsymbol{F}` is a source term that may include atmospheric pressure and tidal potential forcings.

Denoting :math:`\boldsymbol{c} := (\xi, U,V)^T`, the system is given in the following compact form:

.. math::
	\begin{aligned}
	\partial_t \boldsymbol{c}+ \nabla \cdot \boldsymbol{\tilde{A}}= \boldsymbol{r}(\boldsymbol{c}),
	\label{compact}
	\end{aligned}
where 

.. math::
	\begin{aligned}
	\label{eq:def_A}
	\boldsymbol{\tilde{A}}=\begin{pmatrix}
	U & V\\
	\frac{U^2}{H}+\frac{g(H^2-h_b^2)}{2}&\frac{U V}{H}\\
	\frac{U V}{H}&\frac{V^2}{H}+\frac{g(H^2-h_b^2)}{2}
	\end{pmatrix}, \;  
	\boldsymbol{r}(\boldsymbol{c})=\begin{pmatrix}
	0\\
	-\tau_{\textrm{bf}}U+f_cV+g\xi \partial_x h_b+F_x\\
	-\tau_{\textrm{bf}}V-f_cU+g\xi \partial_y h_b+F_y\\
	\end{pmatrix}.
	\end{aligned}


This example uses the following types of boundary conditions for the SWE:

* *Land:* No normal flow  :math:`\boldsymbol{q} \cdot \boldsymbol{n} = 0`. 
* *Open-sea:* Prescribed free surface elevation :math:`\xi(t, \boldsymbol{x}) = \hat{\xi}(t, \boldsymbol{x})`. 

The following boundary conditions are furthermore available for different scenarios:

* *Dirichlet:* All unknowns are specified :math:`\boldsymbol{q}(t, \boldsymbol{x})  = \hat{\boldsymbol{q}}(t, \boldsymbol{x}), \;\; \xi(t, \boldsymbol{x}) = \hat{\xi}(t, \boldsymbol{x})`.
* *River:* For supercritical flow examples, we set the following river (inflow) boundary conditions: :math:`q_{\boldsymbol{n}}(t, \boldsymbol{x}) = \hat{q}_{\boldsymbol{n}}(t, \boldsymbol{x}), \quad q_{\boldsymbol{\tau}}(t, \boldsymbol{x}) =  \hat{q}_{\boldsymbol{\tau}}(t, \boldsymbol{x}),  \quad \xi(t, \boldsymbol{x}) = \hat{\xi}(t, \boldsymbol{x})` with the normal and tangential integrated velocities :math:`\hat{q}_{\boldsymbol{n}}(t, \boldsymbol{x})` and :math:`\hat{q}_{\boldsymbol{\tau}}(t, \boldsymbol{x})`.
* *Radiation:* At the outflow boundary of supercritical flow examples, we specify radiation boundary conditions where no unknowns are prescribed.

The initial conditions for the elevation and integrated velocity can be provided as follows:

:math:`\xi(0, \boldsymbol{x}) = \xi_0(\boldsymbol{x}),\quad  {\boldsymbol{q}}(0, \boldsymbol{x}) = {\boldsymbol{q}}_0(\boldsymbol{x})`

This example uses the lake-at-rest initial condition.


Quadrature-free formulation
---------------------------

To avoid fracture-type nonlinearities to permit a quadrature-free DG formulation, we modify the system as follows (cf. [FKAZGK2020]_). 
Introducing the depth averaged velocity :math:`\boldsymbol{u}=(u,v)^T` related to :math:`\boldsymbol{q}` by :math:`\boldsymbol{q} = \boldsymbol{u}H` the new system reads

.. math::
	\begin{aligned}
	\label{compact_qf_1}
	\partial_t \boldsymbol{c}+ \nabla \cdot \boldsymbol{A}= \boldsymbol{r}(\boldsymbol{c}),\\
	\label{compact_qf_2}
	\boldsymbol{u} H = \boldsymbol{q},
	\end{aligned}
	
where :math:`H^2-h_b^2=\xi^2+2h_b\xi+h_b^2-h_b^2=\xi(H+h_b)` and thus 

.. math::
	\begin{aligned}
	\boldsymbol{A}=\begin{pmatrix}
	U & V\\
	Uu+\frac{g\xi(H+h_b)}{2}&U v\\
	V u& V v+\frac{g\xi(H+h_b)}{2}
	\end{pmatrix}.
	\end{aligned}


DG discretization
-----------------
As a next step, we discretize the continuous problem. Correspondingly, the physical domain :math:`\Omega` is triangulated into triangles :math:`\Omega_e, e \in \{0,...,E\}`. 
The local variational formulation of the system on an element :math:`\Omega_e` is obtained by multiplying with sufficiently smooth test functions :math:`\boldsymbol{\phi}` and :math:`\boldsymbol{\psi}` followed by the integration by parts where we use the notation :math:`(\cdot,\cdot)_{\Omega_e}` and :math:`\langle\cdot,\cdot\rangle_{\partial \Omega_e}` for the :math:`L^2`-scalar products on elements and edges, respectively, and denote by :math:`\boldsymbol{n}_e` an exterior unit normal to :math:`\partial \Omega_e`

.. math::
	\begin{aligned}
	\label{variational_1}
	&\left(\partial_t \boldsymbol{c},\boldsymbol{\phi}\right)_{\Omega_e}-\left(\boldsymbol{{A}}, \nabla \boldsymbol{\phi}\right)_{\Omega_e}+\langle\boldsymbol{{A}}\cdot\boldsymbol{n}_e,  \boldsymbol{\phi}\rangle_{\partial\Omega_e}= \left(\boldsymbol{r},\boldsymbol{\phi}\right)_{\Omega_e},\\
	\label{variational_2}
	&\left(\boldsymbol{u} H, \boldsymbol{\psi} \right)_{\Omega_e}
	= \left(\boldsymbol{q}, \boldsymbol{\psi} \right)_{\Omega_e}.
	\end{aligned}

Next for obtaining the semi-discrete formulation, let :math:`\mathbb{P}^p(\Omega_e)` be the polynomial space of order :math:`p` on :math:`\Omega_e`. We replacing :math:`\boldsymbol{c}` and :math:`\boldsymbol{u}` by their finite-dimensional counterparts :math:`\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta` and utilize test functions :math:`\boldsymbol{\phi}_\Delta \in \mathbb{P}^p(\Omega_e)^3, \, \boldsymbol{\psi}_\Delta \in \mathbb{P}^p(\Omega_e)^2`:

.. math::
	\begin{aligned}
	&\left(\partial_t \boldsymbol{c}_\Delta,\boldsymbol{\phi}_\Delta\right)_{\Omega_e}
	-\left(\boldsymbol{{A}}, \nabla \boldsymbol{\phi}_\Delta\right)_{\Omega_e}
	+\langle\boldsymbol{\widehat{A}},\boldsymbol{\phi}_\Delta \rangle_{\partial\Omega_e}
	= \left(\boldsymbol{r},\boldsymbol{\phi}_\Delta\right)_{\Omega_e},\\
	&\left(\boldsymbol{u}_\Delta H_\Delta, \boldsymbol{\psi}_\Delta \right)_{\Omega_e}
	= \left(\boldsymbol{q}_\Delta, \boldsymbol{\psi}_\Delta \right)_{\Omega_e}.
	\end{aligned}

The edge flux :math:`\boldsymbol{{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta) \cdot\boldsymbol{n}_e` is approximated on :math:`\partial\Omega_e` by a numerical flux :math:`\boldsymbol{\widehat{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta,\boldsymbol{\tilde{c}}_\Delta, \boldsymbol{\tilde{u}}_\Delta, \boldsymbol{n}_e)` that depends on discontinuous values of the solution on element :math:`\Omega_e` (without tilde)  and on its edge neighbours (with tilde). On exterior domain boundaries, the specified elevation and velocity boundary values are utilized in the flux computation. In this example, we use the Lax--Friedrichs flux, defined as 

.. math::
	\begin{aligned}
	\boldsymbol{\widehat{A}}^{LF}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta,\boldsymbol{\tilde{c}}_\Delta, \boldsymbol{\tilde{u}}_\Delta, \boldsymbol{n}_e)
	=\frac{1}{2}\big(\boldsymbol{{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta)+\boldsymbol{{A}}(\boldsymbol{\tilde{c}}_\Delta, \boldsymbol{\tilde{u}}_\Delta)\big) \cdot \boldsymbol{n}_e
	+\frac{\lambda}{2} (\boldsymbol{c}_\Delta-\boldsymbol{\tilde{c}}_\Delta),
	\end{aligned}
where :math:`\lambda` denotes the largest (in absolute value) eigenvalue of :math:`\frac{\partial}{\partial \boldsymbol{c}} \boldsymbol{\widetilde{A}}(\boldsymbol{c}) \cdot \boldsymbol{n}_e`.  
Denoting by :math:`\boldsymbol{x}_f` the midpoint of edge :math:`f \subset \partial\Omega_e`, our quadrature-free scheme uses the following approximation of :math:`\lambda|_f` (cf. [FKAZGK2020]_):

.. math::
	\begin{aligned}
	\lambda|_f := \max\left\{ \left|{\boldsymbol{u}_\Delta}(\boldsymbol{x}_f) \cdot\boldsymbol{n}_e\right|, \left|{\boldsymbol{\tilde{u}}_\Delta}(\boldsymbol{x}_f) \cdot\boldsymbol{n}_e\right|\right\}+ \max \left\{\sqrt{g {H_\Delta}(\boldsymbol{x}_f)},\sqrt{g {\widetilde{H}_\Delta}(\boldsymbol{x}_f)}\right\}.
	\end{aligned}

We now take a basis :math:`\varphi_{ei}(\boldsymbol{x}), \,i=1, \ldots, K(p)`, of :math:`\mathbb{P}^p(\Omega_e)` and can represent :math:`\boldsymbol{c}_\Delta` and :math:`\boldsymbol{u}_\Delta` as

.. math::
	\begin{aligned}
	\boldsymbol{c}_\Delta(t,\boldsymbol{x})|_{\Omega_e} &=(\xi_\Delta, U_\Delta, V_\Delta)^T(t,\boldsymbol{x})=\sum_{j=1}^3\sum_{i=1}^{K(p)} c_{ei}^j\varphi_{ei}(\boldsymbol{x})\,\boldsymbol{e}_j, \label{c_h}\\
	 \boldsymbol{u}_\Delta(t,\boldsymbol{x}) |_{\Omega_e} &=(u_\Delta, v_\Delta)^T(t,\boldsymbol{x})=\sum_{j=1}^2\sum_{i=1}^{K(p)} u_{ei}^j\varphi_{ei}(\boldsymbol{x})\,\boldsymbol{e}_j\label{c_h_tilde}
	\end{aligned}
with :math:`\boldsymbol{e}_j` denoting the :math:`j`-th unit vector in :math:`\mathbb{R}^3` or :math:`\mathbb{R}^2`. 
A basis of :math:`\mathbb{P}^p(\Omega_e)` is defined via a mapping from the corresponding reference basis on :math:`\hat{\Omega}`

.. math::
	\begin{aligned}
	\varphi_{ei}(\boldsymbol{x})=\begin{cases}\hat{\varphi}_i\left(\boldsymbol{F}_e^{-1}(\boldsymbol{x})\right) & \boldsymbol{x}\in \Omega_e,\\0& \text{otherwise},\end{cases}\quad i \in \{1,\dots,K(p)\}
	\end{aligned}
where :math:`F_e: \hat{\Omega}\rightarrow\Omega_e,\; \hat{\boldsymbol{x}}\rightarrow \boldsymbol{x} := \boldsymbol{B}_e\hat{\boldsymbol{x}}+\boldsymbol{a}_{e1}` is the affine-linear transformation from the reference triangle onto triangle :math:`\Omega_e` as shown in the figure below  with

.. math::
	\begin{aligned}
	\boldsymbol{B}_e = \begin{bmatrix} B^e_{1,1} & B^e_{1,2} \\
	B^e_{2,1} &B^e_{2,2}\end{bmatrix} := \begin{bmatrix} \boldsymbol{a}_{e2}-\boldsymbol{a}_{e1} & \boldsymbol{a}_{e3}-\boldsymbol{a}_{e1}\end{bmatrix}
	\end{aligned}
and :math:`\boldsymbol{a}_{e1}, \boldsymbol{a}_{e2}, \boldsymbol{a}_{e3}` denoting the vertex coordinates of :math:`\Omega_e`.

.. figure:: figures/reference_triangle.svg
	:align: center
	:width: 75%
	
.. line-block::
		Fig. 1: Affine-linear mapping :math:`F_e` from the reference triangle :math:`\hat{\Omega} = \left\{\hat{\boldsymbol{a}}_1, \hat{\boldsymbol{a}}_2, \hat{\boldsymbol{a}}_2 \right\} = \left\{(0,0)^T, (1,0)^T, (0,1)^T \right\}` to physical triangle :math:`\Omega_e = \left\{\boldsymbol{a}_{e,1}, \, \boldsymbol{a}_{e,2}, \boldsymbol{a}_{e,3} \right\}`.

The number of basis functions :math:`K(p)` is dependent on the respective polynomial space and has the following values in :math:`\mathbb{R}^2`: :math:`K(0)\!=\!1, \, K(1)\!=\!3` and :math:`K(2)\!=\!6`.
The reference basis functions employed in our implementation are orthonormal with respect to the :math:`L^2`-scalar product on :math:`\hat{\Omega}` and given by


.. math::
	\begin{aligned}
	\left.\!\!\!\!
	\begin{array}{l}
	\left.\!\!\!\!
	\begin{array}{l}
	\hat{\varphi}_1(\hat{\boldsymbol{x}})=\sqrt{2}, \left. \; \right\}  \boldsymbol{\mathbb{P}^0}(\hat{\Omega})\\
	\hat{\varphi}_2(\hat{\boldsymbol{x}})= 2 - 6\hat{x},\\
	\hat{\varphi}_3(\hat{\boldsymbol{x}})= \sqrt{12} (1 - \hat{x} - 2\hat{y}),
	\end{array} \right\} \; \boldsymbol{\mathbb{P}^1}(\hat{\Omega})\\
	\!\!\!\!\hat{\varphi}_4(\hat{\boldsymbol{x}})= \sqrt{6}\left(1-8\hat{x}+10\hat{x}^2\right),\\
	\!\!\!\!\hat{\varphi}_5(\hat{\boldsymbol{x}})= 
	\sqrt{3}\left(-1-4\hat{x}+5\hat{x}^2+12\hat{y}-15\hat{y}^2\right),\\
	\!\!\!\!\hat{\varphi}_6(\hat{\boldsymbol{x}})=
	\sqrt{45}\left(1-4\hat{x}+3\hat{x}^2-4\hat{y}+8\hat{x}\hat{y}+3\hat{y}^2\right),
	\end{array} \right\} \boldsymbol{\mathbb{P}^2}(\hat{\Omega})
	\end{aligned}

Up to this point, we defined the discrete formulation of the problem we want to solve. 
The next step is to look at the algebraic representation of the element and edge integrals that arise.

For compactness, we show an algebraic representation of our discrete scheme only for element integrals. We insert the basis representations into the system and test the first equation with :math:`\boldsymbol{\phi}_\Delta = \varphi_{ek} \boldsymbol{e}_1` and obtain for :math:`k \in  {1,\dots,K(p)}`:

.. math::
   \begin{aligned}
   &\left(\boldsymbol{\tilde{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta), \nabla (\varphi_{ek} \boldsymbol{e}_1) \right)_{\Omega_e}
   = \sum_{i=1}^{K(p)}\left[ c_{ei}^{2}\int_{\Omega_e} \frac{\partial\varphi_{ek}} {\partial x}\varphi_{ei}\mathrm{d}\boldsymbol{x}+ c_{ei}^{3}\int_{\Omega_e} \frac{\partial\varphi_{ek}} {\partial y}\varphi_{ei}\mathrm{d}\boldsymbol{x}\right]\nonumber\\
   &\quad =\sum_{i=1}^{K(p)} \Bigg{[}\frac{c_{ei}^2}{|\mathrm{det}(\boldsymbol{B}_e)|}\left({B}^e_{2,2}\int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i} \mathrm{d}\hat{\boldsymbol{x}}\right)\nonumber\\
   &\qquad +\frac{c_{ei}^3}{|\mathrm{det}(\boldsymbol{B}_e)|}\left(-{B}^e_{1,2}\int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}} + {B}^e_{1,1} \int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}}\right)\Bigg{]}.
   \label{mass_basis}\end{aligned}

For :math:`\boldsymbol{\phi}_\Delta = \varphi_{ek} \boldsymbol{e}_2` and
:math:`k \in  {1,\dots,K(p)}`, we get:

.. math::
	\begin{aligned}
	& \left(\boldsymbol{\tilde{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta), \nabla (\varphi_{ek} \boldsymbol{e}_2) \right)_{\Omega_e}\nonumber \\
	&\quad=\sum_{i,m=1}^{K(p)}\left[ c_{ei}^2u_{em}^{1}\int_{\Omega_e}  \frac{\partial\varphi_{ek}}{\partial{x}}\varphi_{ei}\varphi_{em}\mathrm{d}\boldsymbol{x}+c_{ei}^2u_{em}^{2}\int_{\Omega_e}  \frac{\partial\varphi_{ek}}{\partial{y}}\varphi_{ei}\varphi_{em}\mathrm{d}\boldsymbol{x}\right]\nonumber\\
	&\qquad+\sum_{i,m=1}^{K(p)}\frac{g}{2} c_{ei}^{1}c_{em}^{1}\int_{\Omega_e}  \frac{\partial\varphi_{ek}}{\partial{x}}\varphi_{ei}\varphi_{em}\mathrm{d}\boldsymbol{x}+\sum_{i=1}^{K(p)} gh_bc_{ei}^{1}\int_{\Omega_e}  \frac{\partial\varphi_{ek}}{\partial{x}} \varphi_{ei} \mathrm{d}\boldsymbol{x}\nonumber\\
	&\quad=\sum_{i,m=1}^{K(p)} \Bigg{[}\frac{c_{ei}^2u_{em}^{1}}{|\mathrm{det}(\boldsymbol{B}_e)|^{3/2}}\left({B}^e_{2,2}\int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\hat{\varphi}_{m}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i}\hat{\varphi}_{m} \mathrm{d}\hat{\boldsymbol{x}}\right)\nonumber\\
	&\qquad+\frac{c_{ei}^2u_{em}^{2}}{|\mathrm{det}(\boldsymbol{B}_e)|^{3/2}}\left(-{B}^e_{1,2}\int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\hat{\varphi}_{m}\mathrm{d}\hat{\boldsymbol{x}} + {B}^e_{1,1} \int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i}\hat{\varphi}_{m} \mathrm{d}\hat{\boldsymbol{x}}\right)\Bigg{]}\nonumber\\
	&\qquad+\sum_{i,m=1}^{K(p)}\frac{g}{2 |\mathrm{det}(\boldsymbol{B}_e)|^{3/2}} c_{ei}^{1}c_{em}^{1}\left({B}^e_{2,2}\int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\hat{\varphi}_{m}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i}\hat{\varphi}_{m} \mathrm{d}\hat{\boldsymbol{x}}\right)\nonumber\\
	&\qquad+\sum_{i=1}^{K(p)} \frac{gh_bc_{ei}^{1}}{|\mathrm{det}(\boldsymbol{B}_e)|} \left({B}^e_{2,2}\int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i} \mathrm{d}\hat{\boldsymbol{x}}\right)
	%\label{momentum_basis_u}
	\end{aligned}

And for :math:`\boldsymbol{\phi}_\Delta = \varphi_{ek} \boldsymbol{e}_3` we get a similar expression.


Mapping to code 
-----------------------

As ExaStencils does not directly support higher-order discretizations like the one we are using, we adapted the workflow as shown below. Instead of starting with the continuous model in the multi-layered approach of ExaSlang, the continuous and discrete problems are manually derived and implemented in GHODDESS, which then directly maps to layer 4 [FKAZGK2020]_. 

.. figure:: figures/exa_layers.svg
	:align: center
	:width: 75%
	
.. line-block::
	Fig. 2: Schematics of the multi-layered approach of ExaSlang in the original (left) and adapted (right) workflow in which GHODDESS directly maps to layer 4.

We want to examine the three steps of mapping the discretized equations to efficient simulation code. 


Math to Python
^^^^^^^^^^^^^^^^^^^^^

In this step, the DG scheme is mapped to Python code with the help of the SymPy package. This is facilitated by already provided basic abstractions for, e.g., basis functions, just as classes representing triangles and data fields. Since ExaStencils is restricted to quadrilateral meshes, we divide each element into two differently oriented triangles – lower and upper – to obtain a triangular mesh for our discretization. 

The algebraic representations of the discrete scheme shown above are implemented manually in our framework. Here we consider the first part of the element integral for :math:`\xi` as an example:

.. math::

   \begin{aligned}
   \sum_{i=1}^{K(p)} \frac{c_{ei}^2}{|\mathrm{det}(\boldsymbol{B}_e)|}\left({B}^e_{2,2}\int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}}  \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i} \mathrm{d}\hat{\boldsymbol{x}}\right),
   \label{example_integral}
   \end{aligned}

which can be implemented in GHODDESS by

.. code:: python

   sum(cu(tri.orientation, k) * tri.det_b_inv * (
       tri.b[1, 1]
       * integrate_over_tri(basis.phi[k] * basis.phi_dx[p])
       - tri.b[1, 0]
       * integrate_over_tri(basis.phi[k] * basis.phi_dy[p]))
     for k in range(d))

Information about the triangles, such as the orientation or the
determinant of the mapping from the reference triangle and its
reciprocal, are stored in :math:`\texttt{tri}`. SymPy is then used to
evaluate integrals and derivatives analytically. 


Python to ExaSlang
^^^^^^^^^^^^^^^^^^^^^

To allow a mapping from the symbolic representation to ExaSlang layer 4 code, SymPy
expressions are enriched with a few required abstractions such as field
symbols that correspond to accesses to ExaSlang fields storing
quantities defined on the computational domain. GHODDESS then creates an
auxiliary knowledge file holding parameters that guide the generation
process, as well as ExaSlang specifications. 

As an example, we once again look at the implementation of the element integral for :math:`\xi` from above. Depending on
the chosen order, this expression will be evaluated differently. For
linear order and the second degree of freedom, i.e., :math:`\texttt{d=3}` and
:math:`\texttt{p=2}`, the underlying sum over the integrals can be evaluated
analytically as

.. math:: \int_{\hat{\Omega}} ( -6 \sqrt{2} ) \mathrm{d}\hat{\boldsymbol{x}}+ \int_{\hat{\Omega}} (-12 + 36 \hat{x}) \mathrm{d}\hat{\boldsymbol{x}} + \int_{\hat{\Omega}} (-6 \sqrt{12} (1 - \hat{x} - 2 \hat{y}))\mathrm{d}\hat{\boldsymbol{x}} = -3 \sqrt{2}

using the basis formulation from above. Together
with the remaining terms, this representation can then be mapped to
ExaSlang. We now consider only the first half of the element integral and ignore the second half, edge integrals, and the right-hand side. The corresponding update kernel would look like shown below.

.. code:: ExaSlang4

   loop over cxiNewLower1 {
     cxiNewLower1 = ( -3.0 * 1.41421356237310
                       * bLower3 * cuLower0 * invDetBLower0 )
   }

Frequently used sub-expressions, such as parts of the transformation
matrix :math:`B` and the reciprocal of its determinant, are stored in
designated data fields. This kernel
only updates triangles with lower orientation and in practice an
equivalent kernel updating the remaining upper triangles is generated.


.. _`sec:exa2code`:

ExaSlang to code
^^^^^^^^^^^^^^^^^^^^^

The ExaStencils code generator is capable of parsing the emitted
ExaSlang code, applying code transformations and certain optimizations,
and, finally, outputting a C++ or CUDA code combined with MPI for
distributed memory parallelism. 

Following the example from the previous paragraph, we can now generate
compilable C++ code similar to the one depicted below without optimizations to give developers insight.

.. code:: c++

   for (int i1 = 0; i1 < 16; i1 += 1) {
    for (int i0 = 0; i0 < 16; i0 += 1) {
     fieldData_cxiNewLower1[((18*i1)+i0+19)] = - (
      4.2426406871193 * fieldData_bLower3[((18*i1)+i0+19)]
      * fieldData_cuLower0[((18*i1)+i0+19)]
      * fieldData_invDetBLower0[((18*i1)+i0+19)] );
    }
   }

Enabling optimizations generates better-performing but less readable code, as shown below.

.. code:: c++

   for (int i1 = 0; i1<16; i1 += 1) {
    double* const _fieldData_bLower3_p0 =
     &(fieldData_bLower3[(18*i1)]);
    /* similarly for _fieldData_cxiNewLower1_p0,
       _fieldData_cuLower0_p0, _fieldData_invDetBLower0_p0 */
    int _start = 0; int _end = 16;
    int _intermediate = std::max({_start,(_end-((_end-_start)%4))});
    if (_start<_intermediate) {
     __m256d _vec00 = _mm256_set1_pd(4.2426406871193);
     for (int i0 = _start; i0<_intermediate; i0 += 4) {
      __m256d _vec01 =
       _mm256_loadu_pd(&(_fieldData_bLower3_p0[(i0+19)]));
      __m256d _vec02 =
       _mm256_loadu_pd(&(_fieldData_cuLower0_p0[(i0+19)]));
      __m256d _vec03 =
       _mm256_loadu_pd(&(_fieldData_invDetBLower0_p0[(i0+19)]));
      __m256d _vec04;
       _vec04 = _mm256_xor_pd(_mm256_mul_pd(
         _mm256_mul_pd(_vec02, _vec03),
         _mm256_mul_pd(_vec00, _vec01)),
        _mm256_set1_pd(-0.0));
       _mm256_storeu_pd(&(_fieldData_cxiNewLower1_p0[(i0+19)]),
        _vec04);
     }
    }
    for (int i0 = _intermediate; i0<_end; i0 += 1) {
     _fieldData_cxiNewLower1_p0[(i0+19)] = - (
      4.2426406871193 * _fieldData_bLower3_p0[(i0+19)]
      * _fieldData_cuLower0_p0[(i0+19)]
      * _fieldData_invDetBLower0_p0[(i0+19)]);
    }
   }


Block-structured grids 
-----------------------

To combine the geometric flexibility of unstructured meshes and the performance benefits of unstructured ones, we use block-structured grids (BSGs) for our ocean domains. [ZGAK2019]_, [ZGAFKK2022]_ Next to unmasked BSGs, GHODDESS also supports masked ones to accurately mesh fine-scale geometric features and further exploit performance benefits associated with structured grids.

In this example, we use an unmasked BSG with one MPI partition of 58 blocks with 32 elements each. It can be found in ``GHODDESS-BSG/grids/BSG_bahamas_b1_f58`` after checking out the submodule *GHODDESS-BSG* and is shown below on the left. On the right, we present the bathymetry and the locations of the four recording stations' exact coordinates in meters
:math:`(38\,667,49\,333)`, :math:`(56\,098,9\,613)`,
:math:`(41\,263,29\,776)`, and :math:`(59\,594,41\,149)`.

.. figure:: figures/bahamas_grid_b1_f58_l2.svg
	:align: center
	:width: 75%
	
.. line-block::
	Fig. 3: Block-structured grid (left) and  bathymetry with locations of four recording stations(right).


Example setup and parameter choice 
-----------------------

In our example, the bottom friction was given via a standard quadratic
friction law :math:`\tau_{\textrm{bf}} = C_f | \boldsymbol{u} |` with coefficient
:math:`C_f = 0.009`, and the (constant) Coriolis parameter was set to
:math:`3.19 \times 10^{-5} s^{-1}`.

The tidal forcing was prescribed at the open boundary consisting of five
tidal components (O1, K1, N2, M2, S2) given analytically by

.. math::

   \begin{aligned}
   \label{eq:tidal-forcing}
   \hat{\xi}(t) &=  0.075\cos\left(\frac{t}{25.82}+3.40\right) 
                 + 0.095\cos\left(\frac{t}{23.94}+3.60\right) 
                +  0.100\cos\left(\frac{t}{12.66}+5.93\right) \nonumber\\
                 &\quad + 0.395\cos\left(\frac{t}{12.42}+0.00\right) 
                + 0.060\cos\left(\frac{t}{12.00}+0.75\right), 
   \end{aligned}

with :math:`t` (hours) denoting the time elapsed from the beginning of
the simulation, and :math:`\hat{\xi}` (meters) the specified tidal
elevation at the open sea boundary. A so-called cold start initialization is performed: The fluid
domain is assumed to be at rest initially (:math:`\xi_0 =0`,
:math:`{\boldsymbol{q}}_0 = 0`), and the boundary condition is applied gradually over a period of two days
using ramping (a coefficient multiplying the amplitude of the tidal
forcing set to zero at the beginning of the simulation and linearly
increasing to one at the end of day two).

The three-day simulations is run with a constant time step of 15 seconds
for the piecewise constant DG discretization
(:math:`p=0`). Because of the CFL condition, a smaller time step size will be necessary for higher-order approximations or finder grids. 

Now we look at the input parameters for this setup.
The parameters are set the file ``src/config/bahamas.py`` in the :math:`\texttt{Config}` class which interits from :math:`\texttt{ConfigMaster}` and :math:`\texttt{ConfigMasterReal}` in this case.

Below we describe the most important parameters to be set for running this example:

Time step size and number and approximation order:

.. code:: python

        config.analytical_test_case = False # to control error output, use physical right hand side and boundary conditions as specified below
        
        config.default_dt = 15  # time step size in seconds
        config.use_time_steps = False  # True: use a fixed number of time steps (timeSteps), False: use a maximum time (t_max)
        config.t_max = 3 * 24 * 60 * 60
        
        config.order = 0 # local approximation order: 0 constant, 1 linear, 2 quadratic, 3 cubic


Grid and bathymetry: Bight of Abaco with Moore's Island:

.. code:: python

        config.minLevel = 2
        config.maxLevel = config.minLevel  # 2*2**maxLevel is the number of triangular elements per dimension
   
        config.grid_folder = "../GHODDESS-BSG/grids/BSG_bahamas_b1_f58"
        config.n_blocks = 1 # number of MPI partitions 
        config.n_frags_per_block = 58  # number of blocks 
        
        
Initial and boundary condition:

.. code:: python

        config.exact_eta = lambda pos, t: sp.S(0)
        config.exact_u = lambda pos, t: sp.S(0)
        config.exact_v = lambda pos, t: sp.S(0)
        
        # use land and open sea boundary condition as specified in the mesh
        config.available_boundary_conditions = [BoundaryCondition.Land, BoundaryCondition.OpenSea] 


        # parameters for open sea boundaries 
        config.nope = 1  # number of open sea boundaries forcing segments
        config.nbfr = 5  # total number of forcing frequencies on open sea boundaries


        # forcing frequencies
        config.amig = [sp.S(0.000067597751162),
                     sp.S(0.000072921165921),
                     sp.S(0.000137879713787),
                     sp.S(0.000140518917083),
                     sp.S(0.000145444119418)]

        # nodal factor
        config.ff = [sp.S(1), sp.S(1), sp.S(1), sp.S(1), sp.S(1)]

        # equilibrium argument in degrees for tidal forcing on open sea boundaries
        config.face = [deg_to_rad(0), deg_to_rad(0), deg_to_rad(0), deg_to_rad(0), deg_to_rad(0)]


        # amplitude of the harmonic forcing function at the open sea boundaries for frequency
        config.emo = [sp.S(0.075),
                    sp.S(0.095),
                    sp.S(0.1),
                    sp.S(0.395),
                    sp.S(0.06)]

        # phase (in degrees) of the harmonic forcing function at the open sea boundaries for frequency
        config.efa = [deg_to_rad(sp.S(194.806)),
                    deg_to_rad(sp.S(206.265)),
                    deg_to_rad(sp.S(340)),
                    deg_to_rad(sp.S(0)),
                    deg_to_rad(sp.S(42.9718))]


        # center of cpp projection in degrees LONG/LAT
        config.slam_0 = deg_to_rad(sp.S(282.5))
        config.sfea_0 = deg_to_rad(sp.S(24))
        
        # ramp function option
        # 0: no ramp function is used for forcing functions
        # 1: a hyperbolic tangent ramp function is specified and applied to the surface elevation specified boundary conditions,
        #    the tidal potential forcing function as well as the wind and atmospheric pressure forcing functions
        config.n_ramp = 1
        config.d_ramp = 2  # duration of ramp function (in days)

Right-hand side forcings:

.. code:: python

        # coriolis option parameter 0: spatially constant, 1: spatially variable
        config.ncor = 0
        # constant coriolis coefficient
        config.cori = sp.S(0.0000319)

        # friction option parameter 0: linear, 1: nonlinear
        config.noli_bf = 1
        # nonlinear bottom friction coefficient, can be dependent on node position
        config.cf = sp.S(0.009)
        
        # gravitational acceleration
	config.default_g = sp.S(9.81) 

Minimum-depth restriction to prevent elements from getting dry:

.. code:: python

        config.restrict_min_depth = True 
        config.min_depth_threshold = sp.Rational(1, 4)
        config.min_depth_zero_velocities = True  # set velocity components to zero in case of min depth restriction
        
OpenMP parallelization:

.. code:: python

        config.omp_enabled = True
	config.omp_num_threads = 8
	
vtk and recording station output:

.. code:: python

        config.print_interval = 3600 # in seconds
        config.print_vtk = True

        config.stations = [[38666.66, 49333.32],
                         [56097.79, 9612.94],
                         [41262.60, 29775.73],
                         [59594.66, 41149.62]]

        config.print_stations_eta = True
        config.print_stations_u = True
        config.print_stations_v = True
        if not config.original_formulation:
            config.print_stations_u_tilde = True
            config.print_stations_v_tilde = True

Running the code and visualizing results
-----------------------

Provided a suitable ``.platform`` file in the directory ``generator``, code is run with the command ``python3 sweMain.py config.bahamas``. It automatically generates layer 4 code and subsequently the optimized C++ code with OpenMP as specified. ``config.bahamas`` exchanged for other configs as required.
The code is directly executed and outputs the time steps specified in :math:`\texttt{config.print_interval}` and the time step size along with a :math:`L^2` norm 
or an :math:`L^2`-error if :math:`\texttt{config.analytical_test_case = True}`. Furthermore, it outputs a summary of kernel timings at the end.

The generated code can be found in the directory ``generated/generated``. The output files from the simulation are in ``generated/data``.

ParaView can, for example, be used to display the free-surface elevation :math:`\xi` or the velocities :math:`u` and :math:`v` as well as the computational mesh.
The video below depicts the approximate solution :math:`\xi`.

.. raw:: html

	<video class="hide-in-ie8" width="479" height="796" autoplay muted controls loop>
		<source src="../../_static/_videos/bahamas.ogv" type="video/ogg" /> 
	Your browser does not support the video tag.
	</video>

.. line-block::
	Fig. 4: Free surface elevation simulation (3 days) for tide driven flow in Bahamas domain.

Furthermore, a time series for the respective recording stations can be written and visualized, as shown below for the free-surface elevation :math:`\xi` at the first station.
	
.. figure:: figures/bahamas_station.svg
	:align: center
	:width: 100%	
.. line-block::

        Fig. 5: Time series plot of free surface elevation at first recording station.
	
.. [FKAZGK2020]  S. Faghih-Naini, S. Kuckuk, V. Aizinger, D. Zint, R. Grosso and H. Köstler. `Quadrature-free discontinuous Galerkin method with code generation features for shallow water equations on automatically generated block-structured meshes`. In: Advances in Water Resources 138 (2020) 103552. `doi:10.1016/j.advwatres.2020.103552 <https://doi.org/10.1016/j.advwatres.2020.103552>`_.

.. [ZGAK2019] D. Zint, R. Grosso, V. Aizinger and H. Köstler. `Generation of block structured grids on complex domains for high performance simulation`. In: V.A. Garanzha, L. Kamenski, H. Si. (Eds.), Numerical Geometry, Grid Generation and Scientific Computing. Springer International Publishing, Cham, pp. 87–99. (2020) `doi.org/10.1007/978-3-030-23436-2_6 <https://doi.org/10.1007/978-3-030-23436-2_6>`_.

.. [ZGAFKK2022] D. Zint, R. Grosso, V. Aizinger, S. Faghih-Naini, S. Kuckuk and H. Köstler. `Automatic Generation of Load-Balancing-Aware Block-Structured Grids for Complex Ocean Domains`. In: T. Robinson, D. Moxey, V. Z. Tomov. (Eds.): Proceedings of the 2022 SIAM International Meshing Roundtable. Virtual Conference, Zenodo, (2022) `doi.org/10.5281/zenodo.6562440 <https://doi.org/10.5281/zenodo.6562440>`_.