File region.hxx

Defines

MAXREGIONBLOCKSIZE

Flexible iterator for Field2D/Field3D

The Region class keeps hold of a set of indices which are used to index a Field. Here, a Field refers to either a Field2D or a Field3D. These indices can either be used directly, or blocks of contiguous indices may be used instead, which allows OpenMP parallelisation.

The BOUT_FOR helper macro is provided as a replacement for for-loops.

Separate index classes are available for Field2Ds (Ind2D) and Field3Ds (Ind3D). Note that while an Ind3D can be used to index a Field2D, an Ind2D cannot be used to index a Field3D. This is because an Ind2D essentially doesn’t keep track of the z-dimension. The MAXREGIONBLOCKSIZE value can be tuned to try to optimise performance on specific hardware. It determines what the largest contiguous block size can be. As we hope the compiler will vectorise the access to these contiguous blocks, the optimal MAXREGIONBLOCKSIZE is likely related to the vector size etc.

BOUT_FOR_SERIAL(index, region)

Helper macros for iterating over a Region making use of the contiguous blocks of indices

These macros all have the same basic form: an outer loop over blocks of contiguous indices, and an inner loop over the indices themselves. This allows the outer loop to be parallelised with OpenMP while the inner loop can be vectorised with the CPU’s native SIMD instructions.

Parameters
  • index: The name of the index variable to use in the loop
  • region: An already existing Region

Alternative forms are also provided for loops that must be done serially, as well as for more control over the OpenMP directives used.

The different macros:

  • BOUT_FOR: OpenMP-aware version that allows speedup with both OpenMP and native SIMD vectorisation. This should be the preferred form for most loops
  • BOUT_FOR_SERIAL: for use with inherently serial loops. If BOUT++ was not compiled with OpenMP, BOUT_FOR falls back to using this form
  • BOUT_FOR_INNER: for use on loops inside OpenMP parallel regions, e.g. in order to declare thread private variables outside of the loop
  • BOUT_FOR_OMP: the most generic form, that takes arbitrary OpenMP directives as an extra argument

Example

The following for-loop:

for (auto index = begin(region); index < end(region); ++index) {
   A[index] = B[index] + C[index];
}

can be converted to a block region loop like so:

BOUT_FOR(index, region) {
   A[index] = B[index] + C[index];
}

BOUT_FOR_OMP(index, region, omp_pragmas)
BOUT_FOR(index, region)
BOUT_FOR_INNER(index, region)

Typedefs

using Ind3D = SpecificInd<IND_TYPE::IND_3D>

Define aliases for global indices in 3D and 2D.

using Ind2D = SpecificInd<IND_TYPE::IND_2D>
using IndPerp = SpecificInd<IND_TYPE::IND_PERP>

Enums

enum IND_TYPE

Values:

IND_3D = 0
IND_2D = 1
IND_PERP = 2

Functions

template<IND_TYPE N>
bool operator==(const SpecificInd<N> &lhs, const SpecificInd<N> &rhs)

Relational operators.

template<IND_TYPE N>
bool operator!=(const SpecificInd<N> &lhs, const SpecificInd<N> &rhs)
template<IND_TYPE N>
bool operator<(const SpecificInd<N> &lhs, const SpecificInd<N> &rhs)
template<IND_TYPE N>
bool operator>(const SpecificInd<N> &lhs, const SpecificInd<N> &rhs)
template<IND_TYPE N>
bool operator>=(const SpecificInd<N> &lhs, const SpecificInd<N> &rhs)
template<IND_TYPE N>
bool operator<=(const SpecificInd<N> &lhs, const SpecificInd<N> &rhs)
template<IND_TYPE N>
SpecificInd<N> operator+(SpecificInd<N> lhs, const SpecificInd<N> &rhs)

Arithmetic operators with integers.

template<IND_TYPE N>
SpecificInd<N> operator+(SpecificInd<N> lhs, int n)
template<IND_TYPE N>
SpecificInd<N> operator+(int n, SpecificInd<N> rhs)
template<IND_TYPE N>
SpecificInd<N> operator-(SpecificInd<N> lhs, int n)
template<IND_TYPE N>
SpecificInd<N> operator-(SpecificInd<N> lhs, const SpecificInd<N> &rhs)
const std::string toString(const Ind3D &i)

Get string representation of Ind3D.

Get string representation of IndPerp.

Get string representation of Ind2D.

std::ostream &operator<<(std::ostream &out, const RegionStats &stats)

Provide an easy way to report a Region’s statistics.

template<typename T>
Region<T> sort(Region<T> &region)

Return a new region with sorted indices.

template<typename T>
Region<T> unique(Region<T> &region)

Return a new region with unique indices.

template<typename T>
Region<T> mask(const Region<T> &region, const Region<T> &mask)

Return a masked version of a region.

template<typename T>
Region<T> operator+(const Region<T> &lhs, const Region<T> &rhs)

Return a new region with combined indices from two Regions This doesn’t attempt to avoid duplicate elements or enforce any sorting etc. but could be done if desired.

  • Addition is currently simple and just extends. Probably mostly ok but we could seek to remove duplicate points. Note we do want to allow duplicate points (one reason we use vector and not set) but what if we add a region that has some duplicates? We could retain them but common usage would probably not want the duplicates.

template<typename T>
Region<T> offset(const Region<T> &region, int offset)

Returns a new region based on input but with indices offset by a constant

template<typename T>
unsigned int size(const Region<T> &region)

Return the number of indices in a Region.

template<IND_TYPE N>
class SpecificInd
#include <region.hxx>

Indices base class for Fields Regions are dereferenced into these

Provides methods for offsetting by fixed amounts in x, y, z, as well as a generic method for offsetting by any amount in multiple directions.

Assumes that the offset is less than the grid size in that direction. This assumption is checked for at CHECK=3. This assumption implies that a FieldPerp cannot be offset in y, and a Field2D cannot be offset in z. A stronger, more expensive check that the resulting offset index doesn’t go out of bounds can be enabled at CHECK=4.

Also provides helper methods for converting Ind2D/Ind3D/IndPerp to x, y, z indices

Examples

Field3D field, result;
auto index = std::begin(region);

result = field[index->yp()] - field[index->ym()];

Public Functions

SpecificInd()
SpecificInd(int i, int ny, int nz)
SpecificInd(int i)
SpecificInd &operator++()

Pre-increment operator.

SpecificInd operator++(int)

Post-increment operator.

SpecificInd &operator--()

Pre-decrement operator.

SpecificInd operator--(int)

Post-decrement operator.

SpecificInd &operator+=(SpecificInd n)

In-place addition.

SpecificInd &operator+=(int n)
SpecificInd &operator-=(SpecificInd n)

In-place subtraction.

SpecificInd &operator-=(int n)
SpecificInd operator%(int n)

Modulus operator.

int x() const

Convenience functions for converting to (x, y, z)

int y() const
int z() const
template<int dd, DIRECTION dir>
const SpecificInd plus() const

Templated routine to return index.?p(offset), where ? is one of {x,y,z} and is determined by the dir template argument. The offset corresponds to the dd template argument.

template<int dd, DIRECTION dir>
const SpecificInd minus() const

Templated routine to return index.?m(offset), where ? is one of {x,y,z} and is determined by the dir template argument. The offset corresponds to the dd template argument.

const SpecificInd xp(int dx = 1) const
const SpecificInd xm(int dx = 1) const

The index one point -1 in x.

const SpecificInd yp(int dy = 1) const

The index one point +1 in y.

const SpecificInd ym(int dy = 1) const

The index one point -1 in y.

const SpecificInd zp(int dz = 1) const

The index one point +1 in z. Wraps around zend to zstart An alternative, non-branching calculation is : ind + dz - nz * ((ind + dz) / nz - ind / nz) but this appears no faster (and perhaps slower).

const SpecificInd zm(int dz = 1) const

The index one point -1 in z. Wraps around zstart to zend An alternative, non-branching calculation is : ind - dz + nz * ( (nz + ind) / nz - (nz + ind - dz) / nz) but this appears no faster (and perhaps slower).

const SpecificInd xpp() const
const SpecificInd xmm() const
const SpecificInd ypp() const
const SpecificInd ymm() const
const SpecificInd zpp() const
const SpecificInd zmm() const
const SpecificInd offset(int dx, int dy, int dz) const

Generic offset of index in multiple directions simultaneously.

Public Members

int ind = -1

Private Members

int ny = -1
int nz = -1
struct RegionStats
#include <region.hxx>

Structure to hold various derived “statistics” from a particular region.

Public Members

int numBlocks = 0

How many blocks.

int minBlockSize = 0

Size of smallest block.

int numMinBlocks = 0

Number of blocks with min size.

int maxBlockSize = 0

Size of largest block.

int numMaxBlocks = 0

Number of blocks with max size.

int numSmallBlocks = 0

Number of “small” blocks, for definition see Region::getStats.

BoutReal maxImbalance = 0

Ratio of largest block to smallest.

template<typename T = Ind3D>
class Region
#include <region.hxx>

Specifies a set of indices which can be iterated over and begin() and end() methods for range-based for loops.

Region is templated on either Ind2D or Ind3D for Field2Ds or Field3Ds, respectively. Trying to create a Region using any other type is a compile time error.

The set of indices is also broken down into sets of contiguous blocks of at most MAXREGIONBLOCKSIZE indices. This allows loops to be parallelised with OpenMP. Iterating using a “block region” may be more efficient, although it requires a bit more set up. The helper macro BOUT_FOR is provided to simplify things.

Example

The indices that form a region can be defined manually:

Region<Ind3D>::RegionIndices indices {0, 2, 4, 8, 3};
Region<Ind3D> region(indices);

then iterated over using begin() and end()

Field3D f(0.0);
for (auto i = region.begin(); i < region.end(); i++ ) {
  f[i] = 1.0;
}

For the region constructed above the following would display 0, 2, 4, 8, 3

for (auto i = region.begin(); i < region.end(); i++ ) {
  output << i.ind << ",";
}

or the more convenient region for loop:

for (const auto &i : r) {
  f[i] = a[i] + b[i];
}

For performance the BOUT_FOR macro should allow OpenMP parallelisation and hardware vectorisation.

BOUT_FOR(i, region) {
  f[i] = a[i] + b[i];
}

If you wish to vectorise but can’t use OpenMP then there is a serial verion of the macro:

BoutReal max=0.;
BOUT_FOR_SERIAL(i, region) {
  max = f[i] > max ? f[i] : max;
}

Public Types

template<>
using data_type = T
template<>
using RegionIndices = std::vector<T>

Indices to iterate over.

template<>
using ContiguousBlock = std::pair<T, T>

Start and end of contiguous region. This describes a range [block.first,block.second)

template<>
using ContiguousBlocks = std::vector<ContiguousBlock>

Collection of contiguous regions.

Public Functions

Region()
Region(int xstart, int xend, int ystart, int yend, int zstart, int zend, int ny, int nz, int maxregionblocksize = 64)
Region(RegionIndices &indices, int maxregionblocksize = 64)
Region(ContiguousBlocks &blocks)
~Region()

Destructor.

RegionIndices::iterator begin()

Expose the iterator over indices for use in range-based for-loops or with STL algorithms, etc.

Note that if the indices are altered using these iterators, the blocks may become out of sync and will need to manually updated

RegionIndices::const_iterator begin() const
RegionIndices::const_iterator cbegin() const
RegionIndices::iterator end()
RegionIndices::const_iterator end() const
RegionIndices::const_iterator cend() const
const ContiguousBlocks &getBlocks() const
const RegionIndices &getIndices() const
void setIndices(RegionIndices &indicesIn, int maxregionblocksize = 64)

Set the indices and ensure blocks updated.

void setBlocks(ContiguousBlocks &blocksIn)

Set the blocks and ensure indices updated.

Region<T> asSorted()

Return a new Region that has the same indices as this one but ensures the indices are sorted.

Region<T> &sort()

Sort this Region in place.

Region<T> asUnique()

Return a new Region that has the same indices as this one but ensures the indices are sorted and unique (i.e. not duplicate indices). Note this sorts the input.

Region<T> &unique()

Make this Region unique in-place.

Region<T> mask(const Region<T> &maskRegion)

Return a new region equivalent to *this but with indices contained in mask Region removed

Region<T> &operator+=(const Region<T> &rhs)

Accumulate operator.

Region<T> &offset(int offset)

Offset all indices by fixed value.

Region<T> &periodicShift(int shift, int period)

Shift all indices by fixed value but wrap around on a given period. This is intended to act in a similar way as numpy’s roll. It should be helpful to calculate offset arrays for periodic directions (e.g. z). For example for shift = 1, period = mesh->LocalNz we would find the zplus indices. For shift = mesh->LocalNy*mesh->LocalNz, period = mesh->LocalNx*mesh->LocalNy*mesh->LocalNz we find xplus indices.

unsigned int size() const

Number of indices (possibly repeated)

RegionStats getStats() const

Returns a RegionStats struct desribing the region.

Private Functions

RegionIndices createRegionIndices(int xstart, int xend, int ystart, int yend, int zstart, int zend, int ny, int nz)

Helper function to create a RegionIndices, given the start and end points in x, y, z, and the total y, z lengths

ContiguousBlocks getContiguousBlocks(int maxregionblocksize) const

Returns a vector of all contiguous blocks contained in the passed region. Limits the maximum size of any contiguous block to maxBlockSize. A contiguous block is described by the inclusive start and the exclusive end of the contiguous block.

RegionIndices getRegionIndices()

Constructs the vector of indices from the stored blocks information.

Private Members

RegionIndices indices
ContiguousBlocks blocks
int ny = -1
int nz = -1