Tide-driven flow at Bight of Abaco (Bahamas)¶

In this example, we examine the simulation of a tide-driven using a quadrature-free discontinuous Galerkin (DG) discretization of the 2D shallow water equations (SWE).

Problem description¶

We consider the 2D shallow water equations in conservative form defined on some 2D domain \(\Omega\).

\[\begin{split}\begin{aligned} &\partial_t \xi+\nabla \cdot \boldsymbol{q}=0\\ &\partial_t \boldsymbol{q} +\nabla \cdot \left(\boldsymbol{qq}^T /H \right) +\tau_{\textrm{bf}} \boldsymbol{q} +\left( \begin{smallmatrix} 0 & -f_c\\ f_c & 0 \end{smallmatrix} \right) \boldsymbol{q} +gH\nabla\xi =\boldsymbol{F} \end{aligned}\end{split}\]

\(\xi\) represents the elevation of the free surface with respect to some datum, e.g., the mean sea level.
\(H = h_b + \xi\) denotes the total water depth with the bathymetry \(h_b\).
\(\boldsymbol{q} := (U,V)^T\) is the depth integrated horizontal velocity.
Furthermore, \(f_c\) is the Coriolis coefficient, \(g\) the gravitational acceleration and \(\tau_{\textrm{bf}}\) the bottom friction coefficient.
Finally, \(\boldsymbol{F}\) is a source term that may include atmospheric pressure and tidal potential forcings.

Denoting \(\boldsymbol{c} := (\xi, U,V)^T\), the system is given in the following compact form:

\[\begin{aligned} \partial_t \boldsymbol{c}+ \nabla \cdot \boldsymbol{\tilde{A}}= \boldsymbol{r}(\boldsymbol{c}), \label{compact} \end{aligned}\]

where

\[\begin{split}\begin{aligned} \label{eq:def_A} \boldsymbol{\tilde{A}}=\begin{pmatrix} U & V\\ \frac{U^2}{H}+\frac{g(H^2-h_b^2)}{2}&\frac{U V}{H}\\ \frac{U V}{H}&\frac{V^2}{H}+\frac{g(H^2-h_b^2)}{2} \end{pmatrix}, \; \boldsymbol{r}(\boldsymbol{c})=\begin{pmatrix} 0\\ -\tau_{\textrm{bf}}U+f_cV+g\xi \partial_x h_b+F_x\\ -\tau_{\textrm{bf}}V-f_cU+g\xi \partial_y h_b+F_y\\ \end{pmatrix}. \end{aligned}\end{split}\]

This example uses the following types of boundary conditions for the SWE:

Land: No normal flow \(\boldsymbol{q} \cdot \boldsymbol{n} = 0\).
Open-sea: Prescribed free surface elevation \(\xi(t, \boldsymbol{x}) = \hat{\xi}(t, \boldsymbol{x})\).

The following boundary conditions are furthermore available for different scenarios:

Dirichlet: All unknowns are specified \(\boldsymbol{q}(t, \boldsymbol{x}) = \hat{\boldsymbol{q}}(t, \boldsymbol{x}), \;\; \xi(t, \boldsymbol{x}) = \hat{\xi}(t, \boldsymbol{x})\).
River: For supercritical flow examples, we set the following river (inflow) boundary conditions: \(q_{\boldsymbol{n}}(t, \boldsymbol{x}) = \hat{q}_{\boldsymbol{n}}(t, \boldsymbol{x}), \quad q_{\boldsymbol{\tau}}(t, \boldsymbol{x}) = \hat{q}_{\boldsymbol{\tau}}(t, \boldsymbol{x}), \quad \xi(t, \boldsymbol{x}) = \hat{\xi}(t, \boldsymbol{x})\) with the normal and tangential integrated velocities \(\hat{q}_{\boldsymbol{n}}(t, \boldsymbol{x})\) and \(\hat{q}_{\boldsymbol{\tau}}(t, \boldsymbol{x})\).
Radiation: At the outflow boundary of supercritical flow examples, we specify radiation boundary conditions where no unknowns are prescribed.

The initial conditions for the elevation and integrated velocity can be provided as follows:

\(\xi(0, \boldsymbol{x}) = \xi_0(\boldsymbol{x}),\quad {\boldsymbol{q}}(0, \boldsymbol{x}) = {\boldsymbol{q}}_0(\boldsymbol{x})\)

This example uses the lake-at-rest initial condition.

Quadrature-free formulation¶

To avoid fracture-type nonlinearities to permit a quadrature-free DG formulation, we modify the system as follows (cf. [FKAZGK2020]). Introducing the depth averaged velocity \(\boldsymbol{u}=(u,v)^T\) related to \(\boldsymbol{q}\) by \(\boldsymbol{q} = \boldsymbol{u}H\) the new system reads

\[\begin{split}\begin{aligned} \label{compact_qf_1} \partial_t \boldsymbol{c}+ \nabla \cdot \boldsymbol{A}= \boldsymbol{r}(\boldsymbol{c}),\\ \label{compact_qf_2} \boldsymbol{u} H = \boldsymbol{q}, \end{aligned}\end{split}\]

where \(H^2-h_b^2=\xi^2+2h_b\xi+h_b^2-h_b^2=\xi(H+h_b)\) and thus

\[\begin{split}\begin{aligned} \boldsymbol{A}=\begin{pmatrix} U & V\\ Uu+\frac{g\xi(H+h_b)}{2}&U v\\ V u& V v+\frac{g\xi(H+h_b)}{2} \end{pmatrix}. \end{aligned}\end{split}\]

DG discretization¶

As a next step, we discretize the continuous problem. Correspondingly, the physical domain \(\Omega\) is triangulated into triangles \(\Omega_e, e \in \{0,...,E\}\). The local variational formulation of the system on an element \(\Omega_e\) is obtained by multiplying with sufficiently smooth test functions \(\boldsymbol{\phi}\) and \(\boldsymbol{\psi}\) followed by the integration by parts where we use the notation \((\cdot,\cdot)_{\Omega_e}\) and \(\langle\cdot,\cdot\rangle_{\partial \Omega_e}\) for the \(L^2\)-scalar products on elements and edges, respectively, and denote by \(\boldsymbol{n}_e\) an exterior unit normal to \(\partial \Omega_e\)

\[\begin{split}\begin{aligned} \label{variational_1} &\left(\partial_t \boldsymbol{c},\boldsymbol{\phi}\right)_{\Omega_e}-\left(\boldsymbol{{A}}, \nabla \boldsymbol{\phi}\right)_{\Omega_e}+\langle\boldsymbol{{A}}\cdot\boldsymbol{n}_e, \boldsymbol{\phi}\rangle_{\partial\Omega_e}= \left(\boldsymbol{r},\boldsymbol{\phi}\right)_{\Omega_e},\\ \label{variational_2} &\left(\boldsymbol{u} H, \boldsymbol{\psi} \right)_{\Omega_e} = \left(\boldsymbol{q}, \boldsymbol{\psi} \right)_{\Omega_e}. \end{aligned}\end{split}\]

Next for obtaining the semi-discrete formulation, let \(\mathbb{P}^p(\Omega_e)\) be the polynomial space of order \(p\) on \(\Omega_e\). We replacing \(\boldsymbol{c}\) and \(\boldsymbol{u}\) by their finite-dimensional counterparts \(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta\) and utilize test functions \(\boldsymbol{\phi}_\Delta \in \mathbb{P}^p(\Omega_e)^3, \, \boldsymbol{\psi}_\Delta \in \mathbb{P}^p(\Omega_e)^2\):

\[\begin{split}\begin{aligned} &\left(\partial_t \boldsymbol{c}_\Delta,\boldsymbol{\phi}_\Delta\right)_{\Omega_e} -\left(\boldsymbol{{A}}, \nabla \boldsymbol{\phi}_\Delta\right)_{\Omega_e} +\langle\boldsymbol{\widehat{A}},\boldsymbol{\phi}_\Delta \rangle_{\partial\Omega_e} = \left(\boldsymbol{r},\boldsymbol{\phi}_\Delta\right)_{\Omega_e},\\ &\left(\boldsymbol{u}_\Delta H_\Delta, \boldsymbol{\psi}_\Delta \right)_{\Omega_e} = \left(\boldsymbol{q}_\Delta, \boldsymbol{\psi}_\Delta \right)_{\Omega_e}. \end{aligned}\end{split}\]

The edge flux \(\boldsymbol{{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta) \cdot\boldsymbol{n}_e\) is approximated on \(\partial\Omega_e\) by a numerical flux \(\boldsymbol{\widehat{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta,\boldsymbol{\tilde{c}}_\Delta, \boldsymbol{\tilde{u}}_\Delta, \boldsymbol{n}_e)\) that depends on discontinuous values of the solution on element \(\Omega_e\) (without tilde) and on its edge neighbours (with tilde). On exterior domain boundaries, the specified elevation and velocity boundary values are utilized in the flux computation. In this example, we use the Lax–Friedrichs flux, defined as

\[\begin{aligned} \boldsymbol{\widehat{A}}^{LF}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta,\boldsymbol{\tilde{c}}_\Delta, \boldsymbol{\tilde{u}}_\Delta, \boldsymbol{n}_e) =\frac{1}{2}\big(\boldsymbol{{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta)+\boldsymbol{{A}}(\boldsymbol{\tilde{c}}_\Delta, \boldsymbol{\tilde{u}}_\Delta)\big) \cdot \boldsymbol{n}_e +\frac{\lambda}{2} (\boldsymbol{c}_\Delta-\boldsymbol{\tilde{c}}_\Delta), \end{aligned}\]

where \(\lambda\) denotes the largest (in absolute value) eigenvalue of \(\frac{\partial}{\partial \boldsymbol{c}} \boldsymbol{\widetilde{A}}(\boldsymbol{c}) \cdot \boldsymbol{n}_e\). Denoting by \(\boldsymbol{x}_f\) the midpoint of edge \(f \subset \partial\Omega_e\), our quadrature-free scheme uses the following approximation of \(\lambda|_f\) (cf. [FKAZGK2020]):

\[\begin{aligned} \lambda|_f := \max\left\{ \left|{\boldsymbol{u}_\Delta}(\boldsymbol{x}_f) \cdot\boldsymbol{n}_e\right|, \left|{\boldsymbol{\tilde{u}}_\Delta}(\boldsymbol{x}_f) \cdot\boldsymbol{n}_e\right|\right\}+ \max \left\{\sqrt{g {H_\Delta}(\boldsymbol{x}_f)},\sqrt{g {\widetilde{H}_\Delta}(\boldsymbol{x}_f)}\right\}. \end{aligned}\]

We now take a basis \(\varphi_{ei}(\boldsymbol{x}), \,i=1, \ldots, K(p)\), of \(\mathbb{P}^p(\Omega_e)\) and can represent \(\boldsymbol{c}_\Delta\) and \(\boldsymbol{u}_\Delta\) as

\[\begin{split}\begin{aligned} \boldsymbol{c}_\Delta(t,\boldsymbol{x})|_{\Omega_e} &=(\xi_\Delta, U_\Delta, V_\Delta)^T(t,\boldsymbol{x})=\sum_{j=1}^3\sum_{i=1}^{K(p)} c_{ei}^j\varphi_{ei}(\boldsymbol{x})\,\boldsymbol{e}_j, \label{c_h}\\ \boldsymbol{u}_\Delta(t,\boldsymbol{x}) |_{\Omega_e} &=(u_\Delta, v_\Delta)^T(t,\boldsymbol{x})=\sum_{j=1}^2\sum_{i=1}^{K(p)} u_{ei}^j\varphi_{ei}(\boldsymbol{x})\,\boldsymbol{e}_j\label{c_h_tilde} \end{aligned}\end{split}\]

with \(\boldsymbol{e}_j\) denoting the \(j\)-th unit vector in \(\mathbb{R}^3\) or \(\mathbb{R}^2\). A basis of \(\mathbb{P}^p(\Omega_e)\) is defined via a mapping from the corresponding reference basis on \(\hat{\Omega}\)

\[\begin{split}\begin{aligned} \varphi_{ei}(\boldsymbol{x})=\begin{cases}\hat{\varphi}_i\left(\boldsymbol{F}_e^{-1}(\boldsymbol{x})\right) & \boldsymbol{x}\in \Omega_e,\\0& \text{otherwise},\end{cases}\quad i \in \{1,\dots,K(p)\} \end{aligned}\end{split}\]

where \(F_e: \hat{\Omega}\rightarrow\Omega_e,\; \hat{\boldsymbol{x}}\rightarrow \boldsymbol{x} := \boldsymbol{B}_e\hat{\boldsymbol{x}}+\boldsymbol{a}_{e1}\) is the affine-linear transformation from the reference triangle onto triangle \(\Omega_e\) as shown in the figure below with

\[\begin{split}\begin{aligned} \boldsymbol{B}_e = \begin{bmatrix} B^e_{1,1} & B^e_{1,2} \\ B^e_{2,1} &B^e_{2,2}\end{bmatrix} := \begin{bmatrix} \boldsymbol{a}_{e2}-\boldsymbol{a}_{e1} & \boldsymbol{a}_{e3}-\boldsymbol{a}_{e1}\end{bmatrix} \end{aligned}\end{split}\]

and \(\boldsymbol{a}_{e1}, \boldsymbol{a}_{e2}, \boldsymbol{a}_{e3}\) denoting the vertex coordinates of \(\Omega_e\).

Fig. 1: Affine-linear mapping \(F_e\) from the reference triangle \(\hat{\Omega} = \left\{\hat{\boldsymbol{a}}_1, \hat{\boldsymbol{a}}_2, \hat{\boldsymbol{a}}_2 \right\} = \left\{(0,0)^T, (1,0)^T, (0,1)^T \right\}\) to physical triangle \(\Omega_e = \left\{\boldsymbol{a}_{e,1}, \, \boldsymbol{a}_{e,2}, \boldsymbol{a}_{e,3} \right\}\).

The number of basis functions \(K(p)\) is dependent on the respective polynomial space and has the following values in \(\mathbb{R}^2\): \(K(0)\!=\!1, \, K(1)\!=\!3\) and \(K(2)\!=\!6\). The reference basis functions employed in our implementation are orthonormal with respect to the \(L^2\)-scalar product on \(\hat{\Omega}\) and given by

\[\begin{split}\begin{aligned} \left.\!\!\!\! \begin{array}{l} \left.\!\!\!\! \begin{array}{l} \hat{\varphi}_1(\hat{\boldsymbol{x}})=\sqrt{2}, \left. \; \right\} \boldsymbol{\mathbb{P}^0}(\hat{\Omega})\\ \hat{\varphi}_2(\hat{\boldsymbol{x}})= 2 - 6\hat{x},\\ \hat{\varphi}_3(\hat{\boldsymbol{x}})= \sqrt{12} (1 - \hat{x} - 2\hat{y}), \end{array} \right\} \; \boldsymbol{\mathbb{P}^1}(\hat{\Omega})\\ \!\!\!\!\hat{\varphi}_4(\hat{\boldsymbol{x}})= \sqrt{6}\left(1-8\hat{x}+10\hat{x}^2\right),\\ \!\!\!\!\hat{\varphi}_5(\hat{\boldsymbol{x}})= \sqrt{3}\left(-1-4\hat{x}+5\hat{x}^2+12\hat{y}-15\hat{y}^2\right),\\ \!\!\!\!\hat{\varphi}_6(\hat{\boldsymbol{x}})= \sqrt{45}\left(1-4\hat{x}+3\hat{x}^2-4\hat{y}+8\hat{x}\hat{y}+3\hat{y}^2\right), \end{array} \right\} \boldsymbol{\mathbb{P}^2}(\hat{\Omega}) \end{aligned}\end{split}\]

Up to this point, we defined the discrete formulation of the problem we want to solve. The next step is to look at the algebraic representation of the element and edge integrals that arise.

For compactness, we show an algebraic representation of our discrete scheme only for element integrals. We insert the basis representations into the system and test the first equation with \(\boldsymbol{\phi}_\Delta = \varphi_{ek} \boldsymbol{e}_1\) and obtain for \(k \in {1,\dots,K(p)}\):

\[\begin{split}\begin{aligned} &\left(\boldsymbol{\tilde{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta), \nabla (\varphi_{ek} \boldsymbol{e}_1) \right)_{\Omega_e} = \sum_{i=1}^{K(p)}\left[ c_{ei}^{2}\int_{\Omega_e} \frac{\partial\varphi_{ek}} {\partial x}\varphi_{ei}\mathrm{d}\boldsymbol{x}+ c_{ei}^{3}\int_{\Omega_e} \frac{\partial\varphi_{ek}} {\partial y}\varphi_{ei}\mathrm{d}\boldsymbol{x}\right]\nonumber\\ &\quad =\sum_{i=1}^{K(p)} \Bigg{[}\frac{c_{ei}^2}{|\mathrm{det}(\boldsymbol{B}_e)|}\left({B}^e_{2,2}\int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i} \mathrm{d}\hat{\boldsymbol{x}}\right)\nonumber\\ &\qquad +\frac{c_{ei}^3}{|\mathrm{det}(\boldsymbol{B}_e)|}\left(-{B}^e_{1,2}\int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}} + {B}^e_{1,1} \int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}}\right)\Bigg{]}. \label{mass_basis}\end{aligned}\end{split}\]

For \(\boldsymbol{\phi}_\Delta = \varphi_{ek} \boldsymbol{e}_2\) and \(k \in {1,\dots,K(p)}\), we get:

\[\begin{split}\begin{aligned} & \left(\boldsymbol{\tilde{A}}(\boldsymbol{c}_\Delta, \boldsymbol{u}_\Delta), \nabla (\varphi_{ek} \boldsymbol{e}_2) \right)_{\Omega_e}\nonumber \\ &\quad=\sum_{i,m=1}^{K(p)}\left[ c_{ei}^2u_{em}^{1}\int_{\Omega_e} \frac{\partial\varphi_{ek}}{\partial{x}}\varphi_{ei}\varphi_{em}\mathrm{d}\boldsymbol{x}+c_{ei}^2u_{em}^{2}\int_{\Omega_e} \frac{\partial\varphi_{ek}}{\partial{y}}\varphi_{ei}\varphi_{em}\mathrm{d}\boldsymbol{x}\right]\nonumber\\ &\qquad+\sum_{i,m=1}^{K(p)}\frac{g}{2} c_{ei}^{1}c_{em}^{1}\int_{\Omega_e} \frac{\partial\varphi_{ek}}{\partial{x}}\varphi_{ei}\varphi_{em}\mathrm{d}\boldsymbol{x}+\sum_{i=1}^{K(p)} gh_bc_{ei}^{1}\int_{\Omega_e} \frac{\partial\varphi_{ek}}{\partial{x}} \varphi_{ei} \mathrm{d}\boldsymbol{x}\nonumber\\ &\quad=\sum_{i,m=1}^{K(p)} \Bigg{[}\frac{c_{ei}^2u_{em}^{1}}{|\mathrm{det}(\boldsymbol{B}_e)|^{3/2}}\left({B}^e_{2,2}\int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\hat{\varphi}_{m}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i}\hat{\varphi}_{m} \mathrm{d}\hat{\boldsymbol{x}}\right)\nonumber\\ &\qquad+\frac{c_{ei}^2u_{em}^{2}}{|\mathrm{det}(\boldsymbol{B}_e)|^{3/2}}\left(-{B}^e_{1,2}\int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\hat{\varphi}_{m}\mathrm{d}\hat{\boldsymbol{x}} + {B}^e_{1,1} \int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i}\hat{\varphi}_{m} \mathrm{d}\hat{\boldsymbol{x}}\right)\Bigg{]}\nonumber\\ &\qquad+\sum_{i,m=1}^{K(p)}\frac{g}{2 |\mathrm{det}(\boldsymbol{B}_e)|^{3/2}} c_{ei}^{1}c_{em}^{1}\left({B}^e_{2,2}\int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\hat{\varphi}_{m}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i}\hat{\varphi}_{m} \mathrm{d}\hat{\boldsymbol{x}}\right)\nonumber\\ &\qquad+\sum_{i=1}^{K(p)} \frac{gh_bc_{ei}^{1}}{|\mathrm{det}(\boldsymbol{B}_e)|} \left({B}^e_{2,2}\int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i} \mathrm{d}\hat{\boldsymbol{x}}\right) %\label{momentum_basis_u} \end{aligned}\end{split}\]

And for \(\boldsymbol{\phi}_\Delta = \varphi_{ek} \boldsymbol{e}_3\) we get a similar expression.

Mapping to code¶

As ExaStencils does not directly support higher-order discretizations like the one we are using, we adapted the workflow as shown below. Instead of starting with the continuous model in the multi-layered approach of ExaSlang, the continuous and discrete problems are manually derived and implemented in GHODDESS, which then directly maps to layer 4 [FKAZGK2020].

Fig. 2: Schematics of the multi-layered approach of ExaSlang in the original (left) and adapted (right) workflow in which GHODDESS directly maps to layer 4.

We want to examine the three steps of mapping the discretized equations to efficient simulation code.

Math to Python¶

In this step, the DG scheme is mapped to Python code with the help of the SymPy package. This is facilitated by already provided basic abstractions for, e.g., basis functions, just as classes representing triangles and data fields. Since ExaStencils is restricted to quadrilateral meshes, we divide each element into two differently oriented triangles – lower and upper – to obtain a triangular mesh for our discretization.

The algebraic representations of the discrete scheme shown above are implemented manually in our framework. Here we consider the first part of the element integral for \(\xi\) as an example:

\[\begin{aligned} \sum_{i=1}^{K(p)} \frac{c_{ei}^2}{|\mathrm{det}(\boldsymbol{B}_e)|}\left({B}^e_{2,2}\int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{x}} \hat{\varphi}_{i}\mathrm{d}\hat{\boldsymbol{x}} - {B}^e_{2,1} \int_{\hat{\Omega}} \frac{\partial\hat{\varphi}_{k}}{\partial \hat{y}} \hat{\varphi}_{i} \mathrm{d}\hat{\boldsymbol{x}}\right), \label{example_integral} \end{aligned}\]

which can be implemented in GHODDESS by

sum(cu(tri.orientation, k) * tri.det_b_inv * (
    tri.b[1, 1]
    * integrate_over_tri(basis.phi[k] * basis.phi_dx[p])
    - tri.b[1, 0]
    * integrate_over_tri(basis.phi[k] * basis.phi_dy[p]))
  for k in range(d))

Information about the triangles, such as the orientation or the determinant of the mapping from the reference triangle and its reciprocal, are stored in \(\texttt{tri}\). SymPy is then used to evaluate integrals and derivatives analytically.

Python to ExaSlang¶

To allow a mapping from the symbolic representation to ExaSlang layer 4 code, SymPy expressions are enriched with a few required abstractions such as field symbols that correspond to accesses to ExaSlang fields storing quantities defined on the computational domain. GHODDESS then creates an auxiliary knowledge file holding parameters that guide the generation process, as well as ExaSlang specifications.

As an example, we once again look at the implementation of the element integral for \(\xi\) from above. Depending on the chosen order, this expression will be evaluated differently. For linear order and the second degree of freedom, i.e., \(\texttt{d=3}\) and \(\texttt{p=2}\), the underlying sum over the integrals can be evaluated analytically as

\[\int_{\hat{\Omega}} ( -6 \sqrt{2} ) \mathrm{d}\hat{\boldsymbol{x}}+ \int_{\hat{\Omega}} (-12 + 36 \hat{x}) \mathrm{d}\hat{\boldsymbol{x}} + \int_{\hat{\Omega}} (-6 \sqrt{12} (1 - \hat{x} - 2 \hat{y}))\mathrm{d}\hat{\boldsymbol{x}} = -3 \sqrt{2}\]

using the basis formulation from above. Together with the remaining terms, this representation can then be mapped to ExaSlang. We now consider only the first half of the element integral and ignore the second half, edge integrals, and the right-hand side. The corresponding update kernel would look like shown below.

loop over cxiNewLower1 {
  cxiNewLower1 = ( -3.0 * 1.41421356237310
                    * bLower3 * cuLower0 * invDetBLower0 )
}

Frequently used sub-expressions, such as parts of the transformation matrix \(B\) and the reciprocal of its determinant, are stored in designated data fields. This kernel only updates triangles with lower orientation and in practice an equivalent kernel updating the remaining upper triangles is generated.

ExaSlang to code¶

The ExaStencils code generator is capable of parsing the emitted ExaSlang code, applying code transformations and certain optimizations, and, finally, outputting a C++ or CUDA code combined with MPI for distributed memory parallelism.

Following the example from the previous paragraph, we can now generate compilable C++ code similar to the one depicted below without optimizations to give developers insight.

for (int i1 = 0; i1 < 16; i1 += 1) {
 for (int i0 = 0; i0 < 16; i0 += 1) {
  fieldData_cxiNewLower1[((18*i1)+i0+19)] = - (
   4.2426406871193 * fieldData_bLower3[((18*i1)+i0+19)]
   * fieldData_cuLower0[((18*i1)+i0+19)]
   * fieldData_invDetBLower0[((18*i1)+i0+19)] );
 }
}

Enabling optimizations generates better-performing but less readable code, as shown below.

for (int i1 = 0; i1<16; i1 += 1) {
 double* const _fieldData_bLower3_p0 =
  &(fieldData_bLower3[(18*i1)]);
 /* similarly for _fieldData_cxiNewLower1_p0,
    _fieldData_cuLower0_p0, _fieldData_invDetBLower0_p0 */
 int _start = 0; int _end = 16;
 int _intermediate = std::max({_start,(_end-((_end-_start)%4))});
 if (_start<_intermediate) {
  __m256d _vec00 = _mm256_set1_pd(4.2426406871193);
  for (int i0 = _start; i0<_intermediate; i0 += 4) {
   __m256d _vec01 =
    _mm256_loadu_pd(&(_fieldData_bLower3_p0[(i0+19)]));
   __m256d _vec02 =
    _mm256_loadu_pd(&(_fieldData_cuLower0_p0[(i0+19)]));
   __m256d _vec03 =
    _mm256_loadu_pd(&(_fieldData_invDetBLower0_p0[(i0+19)]));
   __m256d _vec04;
    _vec04 = _mm256_xor_pd(_mm256_mul_pd(
      _mm256_mul_pd(_vec02, _vec03),
      _mm256_mul_pd(_vec00, _vec01)),
     _mm256_set1_pd(-0.0));
    _mm256_storeu_pd(&(_fieldData_cxiNewLower1_p0[(i0+19)]),
     _vec04);
  }
 }
 for (int i0 = _intermediate; i0<_end; i0 += 1) {
  _fieldData_cxiNewLower1_p0[(i0+19)] = - (
   4.2426406871193 * _fieldData_bLower3_p0[(i0+19)]
   * _fieldData_cuLower0_p0[(i0+19)]
   * _fieldData_invDetBLower0_p0[(i0+19)]);
 }
}

Block-structured grids¶

To combine the geometric flexibility of unstructured meshes and the performance benefits of unstructured ones, we use block-structured grids (BSGs) for our ocean domains. [ZGAK2019], [ZGAFKK2022] Next to unmasked BSGs, GHODDESS also supports masked ones to accurately mesh fine-scale geometric features and further exploit performance benefits associated with structured grids.

In this example, we use an unmasked BSG with one MPI partition of 58 blocks with 32 elements each. It can be found in GHODDESS-BSG/grids/BSG_bahamas_b1_f58 after checking out the submodule GHODDESS-BSG and is shown below on the left. On the right, we present the bathymetry and the locations of the four recording stations’ exact coordinates in meters \((38\,667,49\,333)\), \((56\,098,9\,613)\), \((41\,263,29\,776)\), and \((59\,594,41\,149)\).

../../_images/bahamas_grid_b1_f58_l2.svg

Fig. 3: Block-structured grid (left) and bathymetry with locations of four recording stations(right).

Example setup and parameter choice¶

In our example, the bottom friction was given via a standard quadratic friction law \(\tau_{\textrm{bf}} = C_f | \boldsymbol{u} |\) with coefficient \(C_f = 0.009\), and the (constant) Coriolis parameter was set to \(3.19 \times 10^{-5} s^{-1}\).

The tidal forcing was prescribed at the open boundary consisting of five tidal components (O1, K1, N2, M2, S2) given analytically by

\[\begin{split}\begin{aligned} \label{eq:tidal-forcing} \hat{\xi}(t) &= 0.075\cos\left(\frac{t}{25.82}+3.40\right) + 0.095\cos\left(\frac{t}{23.94}+3.60\right) + 0.100\cos\left(\frac{t}{12.66}+5.93\right) \nonumber\\ &\quad + 0.395\cos\left(\frac{t}{12.42}+0.00\right) + 0.060\cos\left(\frac{t}{12.00}+0.75\right), \end{aligned}\end{split}\]

with \(t\) (hours) denoting the time elapsed from the beginning of the simulation, and \(\hat{\xi}\) (meters) the specified tidal elevation at the open sea boundary. A so-called cold start initialization is performed: The fluid domain is assumed to be at rest initially (\(\xi_0 =0\), \({\boldsymbol{q}}_0 = 0\)), and the boundary condition is applied gradually over a period of two days using ramping (a coefficient multiplying the amplitude of the tidal forcing set to zero at the beginning of the simulation and linearly increasing to one at the end of day two).

The three-day simulations is run with a constant time step of 15 seconds for the piecewise constant DG discretization (\(p=0\)). Because of the CFL condition, a smaller time step size will be necessary for higher-order approximations or finder grids.

Now we look at the input parameters for this setup. The parameters are set the file src/config/bahamas.py in the \(\texttt{Config}\) class which interits from \(\texttt{ConfigMaster}\) and \(\texttt{ConfigMasterReal}\) in this case.

Below we describe the most important parameters to be set for running this example:

Time step size and number and approximation order:

config.analytical_test_case = False # to control error output, use physical right hand side and boundary conditions as specified below

config.default_dt = 15  # time step size in seconds
config.use_time_steps = False  # True: use a fixed number of time steps (timeSteps), False: use a maximum time (t_max)
config.t_max = 3 * 24 * 60 * 60

config.order = 0 # local approximation order: 0 constant, 1 linear, 2 quadratic, 3 cubic

Grid and bathymetry: Bight of Abaco with Moore’s Island:

config.minLevel = 2
config.maxLevel = config.minLevel  # 2*2**maxLevel is the number of triangular elements per dimension

config.grid_folder = "../GHODDESS-BSG/grids/BSG_bahamas_b1_f58"
config.n_blocks = 1 # number of MPI partitions
config.n_frags_per_block = 58  # number of blocks

Initial and boundary condition:

config.exact_eta = lambda pos, t: sp.S(0)
config.exact_u = lambda pos, t: sp.S(0)
config.exact_v = lambda pos, t: sp.S(0)

# use land and open sea boundary condition as specified in the mesh
config.available_boundary_conditions = [BoundaryCondition.Land, BoundaryCondition.OpenSea]


# parameters for open sea boundaries
config.nope = 1  # number of open sea boundaries forcing segments
config.nbfr = 5  # total number of forcing frequencies on open sea boundaries


# forcing frequencies
config.amig = [sp.S(0.000067597751162),
             sp.S(0.000072921165921),
             sp.S(0.000137879713787),
             sp.S(0.000140518917083),
             sp.S(0.000145444119418)]

# nodal factor
config.ff = [sp.S(1), sp.S(1), sp.S(1), sp.S(1), sp.S(1)]

# equilibrium argument in degrees for tidal forcing on open sea boundaries
config.face = [deg_to_rad(0), deg_to_rad(0), deg_to_rad(0), deg_to_rad(0), deg_to_rad(0)]


# amplitude of the harmonic forcing function at the open sea boundaries for frequency
config.emo = [sp.S(0.075),
            sp.S(0.095),
            sp.S(0.1),
            sp.S(0.395),
            sp.S(0.06)]

# phase (in degrees) of the harmonic forcing function at the open sea boundaries for frequency
config.efa = [deg_to_rad(sp.S(194.806)),
            deg_to_rad(sp.S(206.265)),
            deg_to_rad(sp.S(340)),
            deg_to_rad(sp.S(0)),
            deg_to_rad(sp.S(42.9718))]


# center of cpp projection in degrees LONG/LAT
config.slam_0 = deg_to_rad(sp.S(282.5))
config.sfea_0 = deg_to_rad(sp.S(24))

# ramp function option
# 0: no ramp function is used for forcing functions
# 1: a hyperbolic tangent ramp function is specified and applied to the surface elevation specified boundary conditions,
#    the tidal potential forcing function as well as the wind and atmospheric pressure forcing functions
config.n_ramp = 1
config.d_ramp = 2  # duration of ramp function (in days)

Right-hand side forcings:

# coriolis option parameter 0: spatially constant, 1: spatially variable
config.ncor = 0
# constant coriolis coefficient
config.cori = sp.S(0.0000319)

# friction option parameter 0: linear, 1: nonlinear
config.noli_bf = 1
# nonlinear bottom friction coefficient, can be dependent on node position
config.cf = sp.S(0.009)

# gravitational acceleration
config.default_g = sp.S(9.81)

Minimum-depth restriction to prevent elements from getting dry:

config.restrict_min_depth = True
config.min_depth_threshold = sp.Rational(1, 4)
config.min_depth_zero_velocities = True  # set velocity components to zero in case of min depth restriction

OpenMP parallelization:

config.omp_enabled = True
config.omp_num_threads = 8

vtk and recording station output:

config.print_interval = 3600 # in seconds
config.print_vtk = True

config.stations = [[38666.66, 49333.32],
                 [56097.79, 9612.94],
                 [41262.60, 29775.73],
                 [59594.66, 41149.62]]

config.print_stations_eta = True
config.print_stations_u = True
config.print_stations_v = True
if not config.original_formulation:
    config.print_stations_u_tilde = True
    config.print_stations_v_tilde = True

Running the code and visualizing results¶

Provided a suitable .platform file in the directory generator, code is run with the command python3 sweMain.py config.bahamas. It automatically generates layer 4 code and subsequently the optimized C++ code with OpenMP as specified. config.bahamas exchanged for other configs as required. The code is directly executed and outputs the time steps specified in \(\texttt{config.print_interval}\) and the time step size along with a \(L^2\) norm or an \(L^2\)-error if \(\texttt{config.analytical_test_case = True}\). Furthermore, it outputs a summary of kernel timings at the end.

The generated code can be found in the directory generated/generated. The output files from the simulation are in generated/data.

ParaView can, for example, be used to display the free-surface elevation \(\xi\) or the velocities \(u\) and \(v\) as well as the computational mesh. The video below depicts the approximate solution \(\xi\).

Fig. 4: Free surface elevation simulation (3 days) for tide driven flow in Bahamas domain.

Furthermore, a time series for the respective recording stations can be written and visualized, as shown below for the free-surface elevation \(\xi\) at the first station.

Fig. 5: Time series plot of free surface elevation at first recording station.

[FKAZGK2020] (1,2,3)

Faghih-Naini, S. Kuckuk, V. Aizinger, D. Zint, R. Grosso and H. Köstler. Quadrature-free discontinuous Galerkin method with code generation features for shallow water equations on automatically generated block-structured meshes. In: Advances in Water Resources 138 (2020) 103552. doi:10.1016/j.advwatres.2020.103552.

[ZGAK2019]

Zint, R. Grosso, V. Aizinger and H. Köstler. Generation of block structured grids on complex domains for high performance simulation. In: V.A. Garanzha, L. Kamenski, H. Si. (Eds.), Numerical Geometry, Grid Generation and Scientific Computing. Springer International Publishing, Cham, pp. 87–99. (2020) doi.org/10.1007/978-3-030-23436-2_6.

[ZGAFKK2022]

Zint, R. Grosso, V. Aizinger, S. Faghih-Naini, S. Kuckuk and H. Köstler. Automatic Generation of Load-Balancing-Aware Block-Structured Grids for Complex Ocean Domains. In: T. Robinson, D. Moxey, V. Z. Tomov. (Eds.): Proceedings of the 2022 SIAM International Meshing Roundtable. Virtual Conference, Zenodo, (2022) doi.org/10.5281/zenodo.6562440.