API Reference

This section describes all available functions of this package.

Public API

julia

getAxis(desc, c)

Given an Axis description and a cube, returns the corresponding axis of the cube. The Axis description can be:

the name as a string or symbol.
an Axis object

source

YAXArrays.Cubes Module

The functions provided by YAXArrays are supposed to work on different types of cubes. This module defines the interface for all Data types that

source

YAXArrays.Cubes.YAXArray Type

julia

YAXArray{T,N}

An array labelled with named axes that have values associated with them. It can wrap normal arrays or, more typically DiskArrays.

Fields

axes: Tuple of Dimensions containing the Axes of the Cube
data: length(axes)-dimensional array which holds the data, this can be a lazy DiskArray
properties: Metadata properties describing the content of the data
chunks: Representation of the chunking of the data
cleaner: Cleaner objects to track which objects to tidy up when the YAXArray goes out of scope

source

YAXArrays.Cubes.caxes Function

Returns the axes of a Cube

source

YAXArrays.Cubes.caxes Method

julia

caxes

Embeds Cube inside a new Cube

source

YAXArrays.Cubes.concatenatecubes Method

julia

function concatenateCubes(cubelist, cataxis::CategoricalAxis)

Concatenates a vector of datacubes that have identical axes to a new single cube along the new axis cataxis

source

YAXArrays.Cubes.readcubedata Method

julia

readcubedata(cube)

Given any array implementing the YAXArray interface it returns an in-memory YAXArray from it.

source

YAXArrays.Cubes.setchunks Method

julia

setchunks(c::YAXArray,chunks)

Resets the chunks of a YAXArray and returns a new YAXArray. Note that this will not change the chunking of the underlying data itself, it will just make the data "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savecube on the resulting array. The chunks argument can take one of the following forms:

a DiskArrays.GridChunks object
a tuple specifying the chunk size along each dimension
an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes

source

YAXArrays.Cubes.subsetcube Function

This function calculates a subset of a cube's data

source

YAXArrays.DAT.InDims Type

julia

InDims(axisdesc...;...)

Creates a description of an Input Data Cube for cube operations. Takes a single or multiple axis descriptions as first arguments. Alternatively a MovingWindow(@ref) struct can be passed to include neighbour slices of one or more axes in the computation. Axes can be specified by their name (String), through an Axis type, or by passing a concrete axis.

Keyword arguments

artype how shall the array be represented in the inner function. Defaults to Array, alternatives are DataFrame or AsAxisArray
filter define some filter to skip the computation, e.g. when all values are missing. Defaults to AllMissing(), possible values are AnyMissing(), AnyOcean(), StdZero(), NValid(n) (for at least n non-missing elements). It is also possible to provide a custom one-argument function that takes the array and returns true if the compuation shall be skipped and false otherwise.
window_oob_value if one of the input dimensions is a MowingWindow, this value will be used to fill out-of-bounds areas

source

YAXArrays.DAT.MovingWindow Type

julia

MovingWindow(desc, pre, after)

Constructs a MovingWindow object to be passed to an InDims constructor to define that the axis in desc shall participate in the inner function (i.e. shall be looped over), but inside the inner function pre values before and after values after the center value will be passed as well.

For example passing MovingWindow("Time", 2, 0) will loop over the time axis and always pass the current time step plus the 2 previous steps. So in the inner function the array will have an additional dimension of size 3.

source

YAXArrays.DAT.OutDims Method

julia

OutDims(axisdesc;...)

Creates a description of an Output Data Cube for cube operations. Takes a single or a Vector/Tuple of axes as first argument. Axes can be specified by their name (String), through an Axis type, or by passing a concrete axis.

axisdesc: List of input axis names
backend : specifies the dataset backend to write data to, must be either :auto or a key in YAXArrayBase.backendlist
update : specifies wether the function operates inplace or if an output is returned
artype : specifies the Array type inside the inner function that is mapped over
chunksize: A Dict specifying the chunksizes for the output dimensions of the cube, or :input to copy chunksizes from input cube axes or :max to not chunk the inner dimensions
outtype: force the output type to a specific type, defaults to Any which means that the element type of the first input cube is used

source

YAXArrays.DAT.CubeTable Method

julia

CubeTable()

Function to turn a DataCube object into an iterable table. Takes a list of as arguments, specified as a name=cube expression. For example CubeTable(data=cube1,country=cube2) would generate a Table with the entries data and country, where data contains the values of cube1 and country the values of cube2. The cubes are matched and broadcasted along their axes like in mapCube.

source

YAXArrays.DAT.cubefittable Method

julia

cubefittable(tab,o,fitsym;post=getpostfunction(o),kwargs...)

Executes fittable on the CubeTable tab with the (Weighted-)OnlineStat o, looping through the values specified by fitsym. Finally, writes the results from the TableAggregator to an output data cube.

source

YAXArrays.DAT.fittable Method

julia

fittable(tab,o,fitsym;by=(),weight=nothing)

Loops through an iterable table tab and thereby fitting an OnlineStat o with the values specified through fitsym. Optionally one can specify a field (or tuple) to group by. Any groupby specifier can either be a symbol denoting the entry to group by or an anynymous function calculating the group from a table row.

For example the following would caluclate a weighted mean over a cube weighted by grid cell area and grouped by country and month:

julia

fittable(iter,WeightedMean,:tair,weight=(i->abs(cosd(i.lat))),by=(i->month(i.time),:country))

source

YAXArrays.DAT.mapCube Method

julia

mapCube(fun, cube, addargs...;kwargs...)

Map a given function fun over slices of all cubes of the dataset ds. Use InDims to discribe the input dimensions and OutDims to describe the output dimensions of the function.

For Datasets, only one output cube can be specified. In contrast to the mapCube function for cubes, additional arguments for the inner function should be set as keyword arguments.

For the specific keyword arguments see the docstring of the mapCube function for cubes.

source

YAXArrays.DAT.mapCube Method

julia

mapCube(fun, cube, addargs...;kwargs...)

Map a given function fun over slices of the data cube cube. The additional arguments addargs will be forwarded to the inner function fun. Use InDims to discribe the input dimensions and OutDims to describe the output dimensions of the function.

Keyword arguments

max_cache=YAXDefaults.max_cache Float64 maximum size of blocks that are read into memory in bits e.g. max_cache=5.0e8. Or String. e.g. max_cache="10MB" or max_cache=1GB defaults to approx 10Mb.
indims::InDims List of input cube descriptors of type InDims for each input data cube.
outdims::OutDims List of output cube descriptors of type OutDims for each output cube.
inplace does the function write to an output array inplace or return a single value> defaults to true
ispar boolean to determine if parallelisation should be applied, defaults to true if workers are available.
showprog boolean indicating if a ProgressMeter shall be shown
include_loopvars boolean to indicate if the varoables looped over should be added as function arguments
nthreads number of threads for the computation, defaults to Threads.nthreads for every worker.
loopchunksize determines the chunk sizes of variables which are looped over, a dict
kwargs additional keyword arguments are passed to the inner function

The first argument is always the function to be applied, the second is the input cube or a tuple of input cubes if needed.

source

YAXArrays.Datasets.Dataset Type

Dataset object which stores an OrderedDict of YAXArrays with Symbol keys. A dictionary of CubeAxes and a Dictionary of general properties. A dictionary can hold cubes with differing axes. But it will share the common axes between the subcubes.

source

YAXArrays.Datasets.Dataset Method

julia

Dataset(; properties = Dict{String,Any}, cubes...)

Construct a YAXArray Dataset with global attributes properties a and a list of named YAXArrays cubes...

source

YAXArrays.Datasets.Cube Method

julia

Cube(ds::Dataset; joinname="Variables")

Construct a single YAXArray from the dataset ds by concatenating the cubes in the datset on the joinname dimension.

source

YAXArrays.Datasets.open_dataset Method

julia

open_dataset(g; skip_keys=(), driver=:all)

Open the dataset at g with the given driver. The default driver will search for available drivers and tries to detect the useable driver from the filename extension.

Keyword arguments

skip_keys are passed as symbols, i.e., skip_keys = (:a, :b)
driver=:all, common options are :netcdf or :zarr.

Example:

julia

ds = open_dataset(f, driver=:zarr, skip_keys = (:c,))

source

YAXArrays.Datasets.open_mfdataset Method

julia

open_mfdataset(files::DD.DimVector{<:AbstractString}; kwargs...)

Opens and concatenates a list of dataset paths along the dimension specified in files. This method can be used when the generic glob-based version of open_mfdataset fails or is too slow. For example, to concatenate a list of annual NetCDF files along the time dimension, one can use:

julia

files = ["1990.nc","1991.nc","1992.nc"]
open_mfdataset(DD.DimArray(files, YAX.time()))

alternatively, if the dimension to concatenate along does not exist yet, the dimension provided in the input arg is used:

julia

files = ["a.nc", "b.nc", "c.nc"]
open_mfdataset(DD.DimArray(files, DD.Dim{:NewDim}(["a","b","c"])))

source

YAXArrays.Datasets.savecube Method

julia

savecube(cube,name::String)

Save a YAXArray to the path.

Extended Help

The keyword arguments are:

name:
datasetaxis="Variables" special treatment of a categorical axis that gets written into separate zarr arrays
max_cache: The number of bits that are used as cache for the data handling.
backend: The backend, that is used to save the data. Falls back to searching the backend according to the extension of the path.
driver: The same setting as backend.
overwrite::Bool=false overwrite cube if it already exists

source

YAXArrays.Datasets.savedataset Method

julia

savedataset(ds::Dataset; path= "", persist=nothing, overwrite=false, append=false, skeleton=false, backend=:all, driver=backend, max_cache=5e8, writefac=4.0)

Saves a Dataset into a file at path with the format given by driver, i.e., driver=:netcdf or driver=:zarr.

Warning

overwrite=true, deletes ALL your data and it will create a new file.

source

YAXArrays.Datasets.to_dataset Method

julia

to_dataset(c;datasetaxis = "Variables", layername = "layer")

Convert a Data Cube into a Dataset. It is possible to treat one of the Cube's axes as a datasetaxis i.e. the cube will be split into different parts that become variables in the Dataset. If no such axis is specified or found, there will only be a single variable in the dataset with the name layername.

source

Internal API

YAXArrays.YAXDefaults Constant

Default configuration for YAXArrays, has the following fields:

workdir[]::String = "./" The default location for temporary cubes.
recal[]::Bool = false set to true if you want @loadOrGenerate to always recalculate the results.
chunksize[]::Any = :input Set the default output chunksize.
max_cache[]::Float64 = 1e8 The maximum cache used by mapCube.
cubedir[]::"" the default location for Cube() without an argument.
subsetextensions::Array{Any} = [] List of registered functions, that convert subsetting input into dimension boundaries.

source

YAXArrays.findAxis Method

julia

findAxis(desc, c)

Internal function

Extended Help

Given an Axis description and a cube return the index of the Axis.

The Axis description can be:

the name as a string or symbol.
an Axis object

source

YAXArrays.getOutAxis Method

julia

getOutAxis

source

YAXArrays.get_descriptor Method

julia

get_descriptor(a)

Get the descriptor of an Axis. This is used to dispatch on the descriptor.

source

YAXArrays.match_axis Method

julia

match_axis

Internal function

Extended Help

Match the Axis based on the AxisDescriptor.
This is used to find different axes and to make certain axis description the same.
For example to disregard differences of captialisation.

source

YAXArrays.Cubes.CleanMe Type

julia

mutable struct CleanMe

Struct which describes data paths and their persistency. Non-persistend paths/files are removed at finalize step

source

YAXArrays.Cubes.clean Method

julia

clean(c::CleanMe)

finalizer function for CleanMe struct. The main process removes all directories/files which are not persistent.

source

YAXArrays.Cubes.copydata Method

julia

copydata(outar, inar, copybuf)

Internal function which copies the data from the input inar into the output outar at the copybuf positions.

source

YAXArrays.Cubes.optifunc Method

julia

optifunc(s, maxbuf, incs, outcs, insize, outsize, writefac)

Internal

This function is going to be minimized to detect the best possible chunk setting for the rechunking of the data.

source

YAXArrays.DAT.DATConfig Type

Configuration object of a DAT process. This holds all necessary information to perform the calculations. It contains the following fields:

incubes::NTuple{NIN, YAXArrays.DAT.InputCube} where NIN: The input data cubes
outcubes::NTuple{NOUT, YAXArrays.DAT.OutputCube} where NOUT: The output data cubes
allInAxes::Vector: List of all axes of the input cubes
LoopAxes::Vector: List of axes that are looped through
ispar::Bool: Flag whether the computation is parallelized
loopcachesize::Vector{Int64}:
allow_irregular_chunks::Bool:
max_cache::Any: Maximal size of the in memory cache
fu::Any: Inner function which is computed
inplace::Bool: Flag whether the computation happens in place
include_loopvars::Bool:
ntr::Any:
do_gc::Bool: Flag if GC should be called explicitly. Probably necessary for many runs in Julia 1.9
addargs::Any: Additional arguments for the inner function
kwargs::Any: Additional keyword arguments for the inner function

source

YAXArrays.DAT.InputCube Type

Internal representation of an input cube for DAT operations

cube: The input data
desc: The input description given by the user/registration
axesSmall: List of axes that were actually selected through the description
icolon
colonperm
loopinds: Indices of loop axes that this cube does not contain, i.e. broadcasts
cachesize: Number of elements to keep in cache along each axis
window
iwindow
windowloopinds
iall

source

YAXArrays.DAT.OutputCube Type

Internal representation of an output cube for DAT operations

Fields

cube: The actual outcube cube, once it is generated
cube_unpermuted: The unpermuted output cube
desc: The description of the output axes as given by users or registration
axesSmall: The list of output axes determined through the description
allAxes: List of all the axes of the cube
loopinds: Index of the loop axes that are broadcasted for this output cube
innerchunks
outtype: Elementtype of the outputcube

source

YAXArrays.DAT.YAXColumn Type

julia

YAXColumn

A struct representing a single column of a YAXArray partitioned Table # Fields

inarBC
inds

source

YAXArrays.DAT.cmpcachmisses Method

Function that compares two cache miss specifiers by their importance

source

YAXArrays.DAT.getFrontPerm Method

Calculate an axis permutation that brings the wanted dimensions to the front

source

YAXArrays.DAT.getLoopCacheSize Method

Calculate optimal Cache size to DAT operation

source

YAXArrays.DAT.getOuttype Method

julia

getOuttype(outtype, cdata)

Internal function

Get the element type for the output cube

source

YAXArrays.DAT.getloopchunks Method

julia

getloopchunks(dc::DATConfig)

Internal function

Returns the chunks that can be looped over toghether for all dimensions.
This computation of the size of the chunks is handled by [`DiskArrays.approx_chunksize`](@ref)

source

YAXArrays.DAT.permuteloopaxes Method

julia

permuteloopaxes(dc)

Internal function

Permute the dimensions of the cube, so that the axes that are looped through are in the first positions. This is necessary for a faster looping through the data.

source

YAXArrays.Cubes.setchunks Method

julia

setchunks(c::Dataset,chunks)

Resets the chunks of all or a subset YAXArrays in the dataset and returns a new Dataset. Note that this will not change the chunking of the underlying data itself, it will just make the data "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savedataset on the resulting array. The chunks argument can take one of the following forms:

a NamedTuple or AbstractDict mapping from variable name to a description of the desired variable chunks
a NamedTuple or AbstractDict mapping from dimension name to a description of the desired variable chunks
a description of the desired variable chunks applied to all members of the Dataset

where a description of the desired variable chunks can take one of the following forms:

a DiskArrays.GridChunks object
a tuple specifying the chunk size along each dimension
an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes

source

YAXArrays.Datasets.collectfromhandle Method

Extracts a YAXArray from a dataset handle that was just created from a arrayinfo

source

YAXArrays.Datasets.createdataset Method

julia

function createdataset(DS::Type,axlist; kwargs...)

Creates a new dataset with axes specified in axlist. Each axis must be a subtype of CubeAxis. A new empty Zarr array will be created and can serve as a sink for mapCube operations.

Keyword arguments

path="" location where the new cube is stored
T=Union{Float32,Missing} data type of the target cube
chunksize = ntuple(i->length(axlist[i]),length(axlist)) chunk sizes of the array
chunkoffset = ntuple(i->0,length(axlist)) offsets of the chunks
persist::Bool=true shall the disk data be garbage-collected when the cube goes out of scope?
overwrite::Bool=false overwrite cube if it already exists
properties=Dict{String,Any}() additional cube properties
globalproperties=Dict{String,Any} global attributes to be added to the dataset
fillvalue= T>:Missing ? defaultfillval(Base.nonmissingtype(T)) : nothing fill value
datasetaxis="Variables" special treatment of a categorical axis that gets written into separate zarr arrays
layername="layer" Fallback name of the variable stored in the dataset if no datasetaxis is found

source

YAXArrays.Datasets.getarrayinfo Method

Extract necessary information to create a YAXArrayBase dataset from a name and YAXArray pair

source

YAXArrays.Datasets.testrange Method

Test if data in x can be approximated by a step range

source

API Reference ​

Public API ​

Internal API ​

API Reference

Public API

Internal API