File spt.hxx#

class LaplaceSPT : public Laplacian#
#include <spt.hxx>

Simple parallelisation of the Thomas tridiagonal solver algorithm (serial code)

This is a reference code which performs the same operations as the serial code. To invert a single XZ slice (FieldPerp object), data must pass from the innermost processor (localmesh->PE_XIND = 0) to the outermost (localmesh->PE_XIND = localmesh->NXPE-1) and back again.

Some parallelism is achieved by running several inversions simultaneously, so while processor #1 is inverting Y=0, processor #0 is starting on Y=1. This works ok as long as the number of slices to be inverted is greater than the number of X processors (MYSUB > localmesh->NXPE). If MYSUB < localmesh->NXPE then not all processors can be busy at once, and so efficiency will fall sharply.

Param b:

[in] RHS values (Ax = b)

Param flags:

[in] Inversion settings (see boundary.h for values)

Param a:

[in] This is a 2D matrix which allows solution of A = Delp2 + a

Param data:

[out] Structure containing data needed for second half of inversion

Param ccoef:

[in] Optional coefficient for first-order derivative

Param d:

[in] Optional factor to multiply the Delp2 operator

Public Functions

LaplaceSPT(Options *opt = nullptr, const CELL_LOC = CELL_CENTRE, Mesh *mesh_in = nullptr, Solver *solver = nullptr)#
inline virtual void setCoefA(const Field2D &val) override#

Set coefficients for inversion. Re-builds matrices if necessary.

inline virtual void setCoefC(const Field2D &val) override#
inline virtual void setCoefD(const Field2D &val) override#
inline virtual void setCoefEx(const Field2D &val) override#
inline virtual void setCoefEz(const Field2D &val) override#
virtual FieldPerp solve(const FieldPerp &b) override#
virtual FieldPerp solve(const FieldPerp &b, const FieldPerp &x0) override#
virtual Field3D solve(const Field3D &b) override#

Extracts perpendicular slices from 3D fields and inverts separately.

In parallel (localmesh->NXPE > 1) this tries to overlap computation and communication. This is done at the expense of more memory useage. Setting low_mem in the config file uses less memory, and less communication overlap

virtual Field3D solve(const Field3D &b, const Field3D &x0) override#

Performs the laplacian inversion y-slice by y-slice

Parameters:
  • b[in] All the y-slices of b_slice, which is the right hand side of the equation A*x_slice = b_slice

  • x0[in] All the y-slices of the variable eventually used to set BC

Returns:

x All the y-slices of x_slice in the equation A*x_slice = b_slice

void setCoefA(const Field2D &val) = 0

Set coefficients for inversion. Re-builds matrices if necessary.

inline void setCoefA(const Field3D &val)#
inline void setCoefA(BoutReal r)#
void setCoefC(const Field2D &val) = 0
inline void setCoefC(const Field3D &val)#
inline void setCoefC(BoutReal r)#
void setCoefD(const Field2D &val) = 0
inline void setCoefD(const Field3D &val)#
inline void setCoefD(BoutReal r)#
void setCoefEx(const Field2D &val) = 0
inline void setCoefEx(const Field3D &val)#
inline void setCoefEx(BoutReal r)#
void setCoefEz(const Field2D &val) = 0
inline void setCoefEz(const Field3D &val)#
inline void setCoefEz(BoutReal r)#
FieldPerp solve(const FieldPerp &b) = 0
Field3D solve(const Field3D &b)
Field2D solve(const Field2D &b)#
inline FieldPerp solve(const FieldPerp &b, const FieldPerp &x0)
Field3D solve(const Field3D &b, const Field3D &x0)

Performs the laplacian inversion y-slice by y-slice

Parameters:
  • b[in] All the y-slices of b_slice, which is the right hand side of the equation A*x_slice = b_slice

  • x0[in] All the y-slices of the variable eventually used to set BC

Returns:

x All the y-slices of x_slice in the equation A*x_slice = b_slice

Field2D solve(const Field2D &b, const Field2D &x0)#

Private Functions

void tridagForward(dcomplex *a, dcomplex *b, dcomplex *c, dcomplex *r, dcomplex *u, int n, dcomplex *gam, dcomplex &bet, dcomplex &um, bool start = false)#

This is the first half of the Thomas algorithm for parallel calculations.

Two complex quantities have to be propagated between processors: bet and u[-1]. This routine takes bet and um from the last processor (if start == false), and returns the values to be passed to the next processor in the same variables.

Parameters:
  • a[in] Vector of matrix coefficients (Left of diagonal)

  • b[in] Vector of matrix coefficients (Diagonal)

  • c[in] Vector of matrix coefficients (Right of diagonal)

  • r[in] RHS vector

  • u[in] Result vector (Au = r)

  • n[in] Size of the matrix

  • gam[out] Intermediate values used for backsolve stage

  • bet[inout]

  • um[inout]

  • start[in]

void tridagBack(dcomplex *u, int n, dcomplex *gam, dcomplex &gp, dcomplex &up)#

Second (backsolve) part of the Thomas algorithm.

Parameters:
  • u[inout] Result to be solved (Au = r)

  • n[in] Size of the problem

  • gam[in] Intermediate values produced by the forward part

  • gp[inout] gam from the processor localmesh->PE_XIND + 1, and returned to localmesh->PE_XIND - 1

  • up[inout] u from processor localmesh->PE_XIND + 1, and returned to localmesh->PE_XIND - 1

int start(const FieldPerp &b, SPT_data &data)#

Simple parallelisation of the Thomas tridiagonal solver algorithm (serial code)

This is a reference code which performs the same operations as the serial code. To invert a single XZ slice (FieldPerp object), data must pass from the innermost processor (localmesh->PE_XIND = 0) to the outermost (localmesh->PE_XIND = localmesh->NXPE-1) and back again.

Some parallelism is achieved by running several inversions simultaneously, so while processor #1 is inverting Y=0, processor #0 is starting on Y=1. This works ok as long as the number of slices to be inverted is greater than the number of X processors (MYSUB > localmesh->NXPE). If MYSUB < localmesh->NXPE then not all processors can be busy at once, and so efficiency will fall sharply.

Parameters:
  • b[in] RHS values (Ax = b)

  • data[out] Structure containing data needed for second half of inversion

int next(SPT_data &data)#

Shifts the parallelised Thomas algorithm along one processor.

Returns non-zero when the calculation is complete.

Parameters:

data[inout] Structure which keeps track of the calculation

void finish(SPT_data &data, FieldPerp &x)#

Finishes the parallelised Thomas algorithm

Parameters:
  • data[inout] Structure keeping track of calculation

  • x[out] The result

Private Members

Field2D Acoef#
Field2D Ccoef#
Field2D Dcoef#
int ys#
int ye#
SPT_data slicedata#
Array<SPT_data> alldata#
Array<dcomplex> dc1d#

1D in Z for taking FFTs

Private Static Attributes

static constexpr int SPT_DATA = 1123#

‘magic’ number for SPT MPI messages

struct SPT_data#

Data structure for SPT algorithm.

Public Functions

void allocate(int mm, int nx)#

Public Members

int jy = 0#

Y index.

Matrix<dcomplex> bk#

b vector in Fourier space

Matrix<dcomplex> xk#
Matrix<dcomplex> gam#
Matrix<dcomplex> avec#
Matrix<dcomplex> bvec#
Matrix<dcomplex> cvec#

Diagonal bands of matrix.

int proc = 0#
int dir = 1#
comm_handle recv_handle = nullptr#
int comm_tag = SPT_DATA#
Array<BoutReal> buffer#