File spt.hxx

class LaplaceSPT : public Laplacian
#include <spt.hxx>

Simple parallelisation of the Thomas tridiagonal solver algorithm (serial code)

This is a reference code which performs the same operations as the serial code. To invert a single XZ slice (FieldPerp object), data must pass from the innermost processor (localmesh->PE_XIND = 0) to the outermost (localmesh->PE_XIND = localmesh->NXPE-1) and back again.

Some parallelism is achieved by running several inversions simultaneously, so while processor #1 is inverting Y=0, processor #0 is starting on Y=1. This works ok as long as the number of slices to be inverted is greater than the number of X processors (MYSUB > localmesh->NXPE). If MYSUB < localmesh->NXPE then not all processors can be busy at once, and so efficiency will fall sharply.

Parameters
  • b[in] RHS values (Ax = b)

  • flags[in] Inversion settings (see boundary.h for values)

  • a[in] This is a 2D matrix which allows solution of A = Delp2 + a

  • data[out] Structure containing data needed for second half of inversion

  • ccoef[in] Optional coefficient for first-order derivative

  • d[in] Optional factor to multiply the Delp2 operator

Public Functions

LaplaceSPT(Options *opt = nullptr, const CELL_LOC = CELL_CENTRE, Mesh *mesh_in = nullptr, Solver *solver = nullptr, Datafile *dump = nullptr)
~LaplaceSPT()
inline virtual void setCoefA(const Field2D &val) override

Set coefficients for inversion. Re-builds matrices if necessary.

inline virtual void setCoefC(const Field2D &val) override
inline virtual void setCoefD(const Field2D &val) override
inline virtual void setCoefEx(const Field2D &val) override
inline virtual void setCoefEz(const Field2D &val) override
virtual FieldPerp solve(const FieldPerp &b) override
virtual FieldPerp solve(const FieldPerp &b, const FieldPerp &x0) override
virtual Field3D solve(const Field3D &b) override

Extracts perpendicular slices from 3D fields and inverts separately.

In parallel (localmesh->NXPE > 1) this tries to overlap computation and communication. This is done at the expense of more memory useage. Setting low_mem in the config file uses less memory, and less communication overlap

virtual Field3D solve(const Field3D &b, const Field3D &x0) override

Performs the laplacian inversion y-slice by y-slice

Parameters
  • b[in] All the y-slices of b_slice, which is the right hand side of the equation A*x_slice = b_slice

  • x0[in] All the y-slices of the variable eventually used to set BC

Returns

x All the y-slices of x_slice in the equation A*x_slice = b_slice

Private Types

enum [anonymous]

Values:

enumerator SPT_DATA

Private Functions

void tridagForward(dcomplex *a, dcomplex *b, dcomplex *c, dcomplex *r, dcomplex *u, int n, dcomplex *gam, dcomplex &bet, dcomplex &um, bool start = false)

This is the first half of the Thomas algorithm for parallel calculations.

Two complex quantities have to be propagated between processors: bet and u[-1]. This routine takes bet and um from the last processor (if start == false), and returns the values to be passed to the next processor in the same variables.

Parameters
  • a[in] Vector of matrix coefficients (Left of diagonal)

  • b[in] Vector of matrix coefficients (Diagonal)

  • c[in] Vector of matrix coefficients (Right of diagonal)

  • r[in] RHS vector

  • u[in] Result vector (Au = r)

  • n[in] Size of the matrix

  • gam[out] Intermediate values used for backsolve stage

  • bet[inout]

  • um[inout]

  • start[in]

void tridagBack(dcomplex *u, int n, dcomplex *gam, dcomplex &gp, dcomplex &up)

Second (backsolve) part of the Thomas algorithm.

Parameters
  • u[inout] Result to be solved (Au = r)

  • n[in] Size of the problem

  • gam[in] Intermediate values produced by the forward part

  • gp[inout] gam from the processor localmesh->PE_XIND + 1, and returned to localmesh->PE_XIND - 1

  • up[inout] u from processor localmesh->PE_XIND + 1, and returned to localmesh->PE_XIND - 1

int start(const FieldPerp &b, SPT_data &data)

Simple parallelisation of the Thomas tridiagonal solver algorithm (serial code)

This is a reference code which performs the same operations as the serial code. To invert a single XZ slice (FieldPerp object), data must pass from the innermost processor (localmesh->PE_XIND = 0) to the outermost (localmesh->PE_XIND = localmesh->NXPE-1) and back again.

Some parallelism is achieved by running several inversions simultaneously, so while processor #1 is inverting Y=0, processor #0 is starting on Y=1. This works ok as long as the number of slices to be inverted is greater than the number of X processors (MYSUB > localmesh->NXPE). If MYSUB < localmesh->NXPE then not all processors can be busy at once, and so efficiency will fall sharply.

Parameters
  • b[in] RHS values (Ax = b)

  • data[out] Structure containing data needed for second half of inversion

int next(SPT_data &data)

Shifts the parallelised Thomas algorithm along one processor.

Returns non-zero when the calculation is complete.

Parameters

data[inout] Structure which keeps track of the calculation

void finish(SPT_data &data, FieldPerp &x)

Finishes the parallelised Thomas algorithm

Parameters
  • data[inout] Structure keeping track of calculation

  • x[out] The result

Private Members

Field2D Acoef
Field2D Ccoef
Field2D Dcoef
int ys
int ye
SPT_data slicedata
SPT_data *alldata
Array<dcomplex> dc1d

1D in Z for taking FFTs

struct SPT_data

Data structure for SPT algorithm.

Public Functions

inline SPT_data()
void allocate(int mm, int nx)
inline ~SPT_data()

Public Members

int jy

Y index.

Matrix<dcomplex> bk

b vector in Fourier space

Matrix<dcomplex> xk
Matrix<dcomplex> gam
Matrix<dcomplex> avec
Matrix<dcomplex> bvec
Matrix<dcomplex> cvec

Diagonal bands of matrix.

int proc
int dir
comm_handle recv_handle
int comm_tag
Array<BoutReal> buffer