File spt.hxx

class LaplaceSPT : public Laplacian
#include <spt.hxx>

Simple parallelisation of the Thomas tridiagonal solver algorithm (serial code)

This is a reference code which performs the same operations as the serial code. To invert a single XZ slice (FieldPerp object), data must pass from the innermost processor (localmesh->PE_XIND = 0) to the outermost (localmesh->PE_XIND = localmesh->NXPE-1) and back again.

Some parallelism is achieved by running several inversions simultaneously, so while processor #1 is inverting Y=0, processor #0 is starting on Y=1. This works ok as long as the number of slices to be inverted is greater than the number of X processors (MYSUB > localmesh->NXPE). If MYSUB < localmesh->NXPE then not all processors can be busy at once, and so efficiency will fall sharply.

Parameters
  • b: RHS values (Ax = b)
  • flags: Inversion settings (see boundary.h for values)
  • a: This is a 2D matrix which allows solution of A = Delp2 + a
  • data: Structure containing data needed for second half of inversion
  • ccoef: Optional coefficient for first-order derivative
  • d: Optional factor to multiply the Delp2 operator

Public Functions

LaplaceSPT(Options *opt = nullptr, const CELL_LOC loc = CELL_CENTRE, Mesh *mesh_in = nullptr)
~LaplaceSPT()
void setCoefA(const Field2D &val)

Set coefficients for inversion. Re-builds matrices if necessary.

void setCoefC(const Field2D &val)
void setCoefD(const Field2D &val)
void setCoefEx(const Field2D &val)
void setCoefEz(const Field2D &val)
FieldPerp solve(const FieldPerp &b)
FieldPerp solve(const FieldPerp &b, const FieldPerp &x0)
Field3D solve(const Field3D &b)

Extracts perpendicular slices from 3D fields and inverts separately.

In parallel (localmesh->NXPE > 1) this tries to overlap computation and communication. This is done at the expense of more memory useage. Setting low_mem in the config file uses less memory, and less communication overlap

Field3D solve(const Field3D &b, const Field3D &x0)

Performs the laplacian inversion y-slice by y-slice

Return
x All the y-slices of x_slice in the equation A*x_slice = b_slice
Parameters
  • b: All the y-slices of b_slice, which is the right hand side of the equation A*x_slice = b_slice
  • x0: All the y-slices of the variable eventually used to set BC

Private Types

enum [anonymous]

Values:

SPT_DATA = 1123

Private Functions

void tridagForward(dcomplex *a, dcomplex *b, dcomplex *c, dcomplex *r, dcomplex *u, int n, dcomplex *gam, dcomplex &bet, dcomplex &um, bool start = false)

This is the first half of the Thomas algorithm for parallel calculations.

Two complex quantities have to be propagated between processors: bet and u[-1]. This routine takes bet and um from the last processor (if start == false), and returns the values to be passed to the next processor in the same variables.

Parameters
  • a: Vector of matrix coefficients (Left of diagonal)
  • b: Vector of matrix coefficients (Diagonal)
  • c: Vector of matrix coefficients (Right of diagonal)
  • r: RHS vector
  • u: Result vector (Au = r)
  • n: Size of the matrix
  • gam: Intermediate values used for backsolve stage
  • bet:
  • um:
  • start:

void tridagBack(dcomplex *u, int n, dcomplex *gam, dcomplex &gp, dcomplex &up)

Second (backsolve) part of the Thomas algorithm.

Parameters
  • u: Result to be solved (Au = r)
  • n: Size of the problem
  • gam: Intermediate values produced by the forward part
  • gp: gam from the processor localmesh->PE_XIND + 1, and returned to localmesh->PE_XIND - 1
  • up: u from processor localmesh->PE_XIND + 1, and returned to localmesh->PE_XIND - 1

int start(const FieldPerp &b, SPT_data &data)

Simple parallelisation of the Thomas tridiagonal solver algorithm (serial code)

This is a reference code which performs the same operations as the serial code. To invert a single XZ slice (FieldPerp object), data must pass from the innermost processor (localmesh->PE_XIND = 0) to the outermost (localmesh->PE_XIND = localmesh->NXPE-1) and back again.

Some parallelism is achieved by running several inversions simultaneously, so while processor #1 is inverting Y=0, processor #0 is starting on Y=1. This works ok as long as the number of slices to be inverted is greater than the number of X processors (MYSUB > localmesh->NXPE). If MYSUB < localmesh->NXPE then not all processors can be busy at once, and so efficiency will fall sharply.

Parameters
  • b: RHS values (Ax = b)
  • data: Structure containing data needed for second half of inversion

int next(SPT_data &data)

Shifts the parallelised Thomas algorithm along one processor.

Returns non-zero when the calculation is complete.

Parameters
  • data: Structure which keeps track of the calculation

void finish(SPT_data &data, FieldPerp &x)

Finishes the parallelised Thomas algorithm

Parameters
  • data: Structure keeping track of calculation
  • x: The result

Private Members

Field2D Acoef
Field2D Ccoef
Field2D Dcoef
int ys
int ye
SPT_data slicedata
SPT_data *alldata
Array<dcomplex> dc1d

1D in Z for taking FFTs

struct SPT_data

Data structure for SPT algorithm.

Public Functions

SPT_data()
void allocate(int mm, int nx)
~SPT_data()

Public Members

int jy

Y index.

Matrix<dcomplex> bk

b vector in Fourier space

Matrix<dcomplex> xk
Matrix<dcomplex> gam
Matrix<dcomplex> avec
Matrix<dcomplex> bvec
Matrix<dcomplex> cvec

Diagonal bands of matrix.

int proc
int dir
comm_handle recv_handle
int comm_tag
Array<BoutReal> buffer