File spt.hxx¶
-
class
LaplaceSPT
: public Laplacian¶ - #include <spt.hxx>
Simple parallelisation of the Thomas tridiagonal solver algorithm (serial code)
This is a reference code which performs the same operations as the serial code. To invert a single XZ slice (FieldPerp object), data must pass from the innermost processor (localmesh->PE_XIND = 0) to the outermost (localmesh->PE_XIND = localmesh->NXPE-1) and back again.
Some parallelism is achieved by running several inversions simultaneously, so while processor #1 is inverting Y=0, processor #0 is starting on Y=1. This works ok as long as the number of slices to be inverted is greater than the number of X processors (MYSUB > localmesh->NXPE). If MYSUB < localmesh->NXPE then not all processors can be busy at once, and so efficiency will fall sharply.
- Parameters
b
: RHS values (Ax = b)flags
: Inversion settings (see boundary.h for values)a
: This is a 2D matrix which allows solution of A = Delp2 + adata
: Structure containing data needed for second half of inversionccoef
: Optional coefficient for first-order derivatived
: Optional factor to multiply the Delp2 operator
Public Functions
-
LaplaceSPT
(Options *opt = nullptr, const CELL_LOC loc = CELL_CENTRE, Mesh *mesh_in = nullptr)¶
-
~LaplaceSPT
()¶
-
Field3D
solve
(const Field3D &b)¶ Extracts perpendicular slices from 3D fields and inverts separately.
In parallel (localmesh->NXPE > 1) this tries to overlap computation and communication. This is done at the expense of more memory useage. Setting low_mem in the config file uses less memory, and less communication overlap
-
Field3D
solve
(const Field3D &b, const Field3D &x0)¶ Performs the laplacian inversion y-slice by y-slice
- Return
- x All the y-slices of x_slice in the equation A*x_slice = b_slice
- Parameters
b
: All the y-slices of b_slice, which is the right hand side of the equation A*x_slice = b_slicex0
: All the y-slices of the variable eventually used to set BC
Private Functions
-
void
tridagForward
(dcomplex *a, dcomplex *b, dcomplex *c, dcomplex *r, dcomplex *u, int n, dcomplex *gam, dcomplex &bet, dcomplex &um, bool start = false)¶ This is the first half of the Thomas algorithm for parallel calculations.
Two complex quantities have to be propagated between processors: bet and u[-1]. This routine takes bet and um from the last processor (if start == false), and returns the values to be passed to the next processor in the same variables.
- Parameters
a
: Vector of matrix coefficients (Left of diagonal)b
: Vector of matrix coefficients (Diagonal)c
: Vector of matrix coefficients (Right of diagonal)r
: RHS vectoru
: Result vector (Au = r)n
: Size of the matrixgam
: Intermediate values used for backsolve stagebet
:um
:start
:
-
void
tridagBack
(dcomplex *u, int n, dcomplex *gam, dcomplex &gp, dcomplex &up)¶ Second (backsolve) part of the Thomas algorithm.
- Parameters
u
: Result to be solved (Au = r)n
: Size of the problemgam
: Intermediate values produced by the forward partgp
: gam from the processor localmesh->PE_XIND + 1, and returned to localmesh->PE_XIND - 1up
: u from processor localmesh->PE_XIND + 1, and returned to localmesh->PE_XIND - 1
-
int
start
(const FieldPerp &b, SPT_data &data)¶ Simple parallelisation of the Thomas tridiagonal solver algorithm (serial code)
This is a reference code which performs the same operations as the serial code. To invert a single XZ slice (FieldPerp object), data must pass from the innermost processor (localmesh->PE_XIND = 0) to the outermost (localmesh->PE_XIND = localmesh->NXPE-1) and back again.
Some parallelism is achieved by running several inversions simultaneously, so while processor #1 is inverting Y=0, processor #0 is starting on Y=1. This works ok as long as the number of slices to be inverted is greater than the number of X processors (MYSUB > localmesh->NXPE). If MYSUB < localmesh->NXPE then not all processors can be busy at once, and so efficiency will fall sharply.
- Parameters
b
: RHS values (Ax = b)data
: Structure containing data needed for second half of inversion
-
struct
SPT_data
¶ Data structure for SPT algorithm.