Skip to content

API Reference

This section describes all available functions of this package.

Public API

YAXArrays.getAxis Method
julia
getAxis(desc, c)

Given an Axis description and a cube, returns the corresponding axis of the cube. The Axis description can be:

  • the name as a string or symbol.

  • an Axis object

source

YAXArrays.Cubes Module

The functions provided by YAXArrays are supposed to work on different types of cubes. This module defines the interface for all Data types that

source

YAXArrays.Cubes.YAXArray Type
julia
YAXArray{T,N}

An array labelled with named axes that have values associated with them. It can wrap normal arrays or, more typically DiskArrays.

Fields

  • axes: Tuple of Dimensions containing the Axes of the Cube

  • data: length(axes)-dimensional array which holds the data, this can be a lazy DiskArray

  • properties: Metadata properties describing the content of the data

  • chunks: Representation of the chunking of the data

  • cleaner: Cleaner objects to track which objects to tidy up when the YAXArray goes out of scope

source

YAXArrays.Cubes.caxes Function

Returns the axes of a Cube

source

YAXArrays.Cubes.caxes Method
julia
caxes

Embeds Cube inside a new Cube

source

YAXArrays.Cubes.concatenatecubes Method
julia
function concatenateCubes(cubelist, cataxis::CategoricalAxis)

Concatenates a vector of datacubes that have identical axes to a new single cube along the new axis cataxis

source

YAXArrays.Cubes.readcubedata Method
julia
readcubedata(cube)

Given any array implementing the YAXArray interface it returns an in-memory YAXArray from it.

source

YAXArrays.Cubes.setchunks Method
julia
setchunks(c::YAXArray,chunks)

Resets the chunks of a YAXArray and returns a new YAXArray. Note that this will not change the chunking of the underlying data itself, it will just make the data "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savecube on the resulting array. The chunks argument can take one of the following forms:

  • a DiskArrays.GridChunks object

  • a tuple specifying the chunk size along each dimension

  • an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes

source

YAXArrays.Cubes.subsetcube Function

This function calculates a subset of a cube's data

source

YAXArrays.DAT.InDims Type
julia
InDims(axisdesc...;...)

Creates a description of an Input Data Cube for cube operations. Takes a single or multiple axis descriptions as first arguments. Alternatively a MovingWindow(@ref) struct can be passed to include neighbour slices of one or more axes in the computation. Axes can be specified by their name (String), through an Axis type, or by passing a concrete axis.

Keyword arguments

  • artype how shall the array be represented in the inner function. Defaults to Array, alternatives are DataFrame or AsAxisArray

  • filter define some filter to skip the computation, e.g. when all values are missing. Defaults to AllMissing(), possible values are AnyMissing(), AnyOcean(), StdZero(), NValid(n) (for at least n non-missing elements). It is also possible to provide a custom one-argument function that takes the array and returns true if the compuation shall be skipped and false otherwise.

  • window_oob_value if one of the input dimensions is a MowingWindow, this value will be used to fill out-of-bounds areas

source

YAXArrays.DAT.MovingWindow Type
julia
MovingWindow(desc, pre, after)

Constructs a MovingWindow object to be passed to an InDims constructor to define that the axis in desc shall participate in the inner function (i.e. shall be looped over), but inside the inner function pre values before and after values after the center value will be passed as well.

For example passing MovingWindow("Time", 2, 0) will loop over the time axis and always pass the current time step plus the 2 previous steps. So in the inner function the array will have an additional dimension of size 3.

source

YAXArrays.DAT.OutDims Method
julia
OutDims(axisdesc;...)

Creates a description of an Output Data Cube for cube operations. Takes a single or a Vector/Tuple of axes as first argument. Axes can be specified by their name (String), through an Axis type, or by passing a concrete axis.

  • axisdesc: List of input axis names

  • backend : specifies the dataset backend to write data to, must be either :auto or a key in YAXArrayBase.backendlist

  • update : specifies wether the function operates inplace or if an output is returned

  • artype : specifies the Array type inside the inner function that is mapped over

  • chunksize: A Dict specifying the chunksizes for the output dimensions of the cube, or :input to copy chunksizes from input cube axes or :max to not chunk the inner dimensions

  • outtype: force the output type to a specific type, defaults to Any which means that the element type of the first input cube is used

source

YAXArrays.DAT.CubeTable Method
julia
CubeTable()

Function to turn a DataCube object into an iterable table. Takes a list of as arguments, specified as a name=cube expression. For example CubeTable(data=cube1,country=cube2) would generate a Table with the entries data and country, where data contains the values of cube1 and country the values of cube2. The cubes are matched and broadcasted along their axes like in mapCube.

source

YAXArrays.DAT.cubefittable Method
julia
cubefittable(tab,o,fitsym;post=getpostfunction(o),kwargs...)

Executes fittable on the CubeTable tab with the (Weighted-)OnlineStat o, looping through the values specified by fitsym. Finally, writes the results from the TableAggregator to an output data cube.

source

YAXArrays.DAT.fittable Method
julia
fittable(tab,o,fitsym;by=(),weight=nothing)

Loops through an iterable table tab and thereby fitting an OnlineStat o with the values specified through fitsym. Optionally one can specify a field (or tuple) to group by. Any groupby specifier can either be a symbol denoting the entry to group by or an anynymous function calculating the group from a table row.

For example the following would caluclate a weighted mean over a cube weighted by grid cell area and grouped by country and month:

julia
fittable(iter,WeightedMean,:tair,weight=(i->abs(cosd(i.lat))),by=(i->month(i.time),:country))

source

YAXArrays.DAT.mapCube Method
julia
mapCube(fun, cube, addargs...;kwargs...)

Map a given function fun over slices of all cubes of the dataset ds. Use InDims to discribe the input dimensions and OutDims to describe the output dimensions of the function.

For Datasets, only one output cube can be specified. In contrast to the mapCube function for cubes, additional arguments for the inner function should be set as keyword arguments.

For the specific keyword arguments see the docstring of the mapCube function for cubes.

source

YAXArrays.DAT.mapCube Method
julia
mapCube(fun, cube, addargs...;kwargs...)

Map a given function fun over slices of the data cube cube. The additional arguments addargs will be forwarded to the inner function fun. Use InDims to discribe the input dimensions and OutDims to describe the output dimensions of the function.

Keyword arguments

  • max_cache=YAXDefaults.max_cache Float64 maximum size of blocks that are read into memory in bits e.g. max_cache=5.0e8. Or String. e.g. max_cache="10MB" or max_cache=1GB defaults to approx 10Mb.

  • indims::InDims List of input cube descriptors of type InDims for each input data cube.

  • outdims::OutDims List of output cube descriptors of type OutDims for each output cube.

  • inplace does the function write to an output array inplace or return a single value> defaults to true

  • ispar boolean to determine if parallelisation should be applied, defaults to true if workers are available.

  • showprog boolean indicating if a ProgressMeter shall be shown

  • include_loopvars boolean to indicate if the varoables looped over should be added as function arguments

  • nthreads number of threads for the computation, defaults to Threads.nthreads for every worker.

  • loopchunksize determines the chunk sizes of variables which are looped over, a dict

  • kwargs additional keyword arguments are passed to the inner function

The first argument is always the function to be applied, the second is the input cube or a tuple of input cubes if needed.

source

YAXArrays.Datasets.Dataset Type

Dataset object which stores an OrderedDict of YAXArrays with Symbol keys. A dictionary of CubeAxes and a Dictionary of general properties. A dictionary can hold cubes with differing axes. But it will share the common axes between the subcubes.

source

YAXArrays.Datasets.Dataset Method
julia
Dataset(; properties = Dict{String,Any}, cubes...)

Construct a YAXArray Dataset with global attributes properties a and a list of named YAXArrays cubes...

source

YAXArrays.Datasets.Cube Method
julia
Cube(ds::Dataset; joinname="Variables")

Construct a single YAXArray from the dataset ds by concatenating the cubes in the datset on the joinname dimension.

source

YAXArrays.Datasets.open_dataset Method
julia
open_dataset(g; skip_keys=(), driver=:all)

Open the dataset at g with the given driver. The default driver will search for available drivers and tries to detect the useable driver from the filename extension.

Keyword arguments

  • skip_keys are passed as symbols, i.e., skip_keys = (:a, :b)

  • driver=:all, common options are :netcdf or :zarr.

Example:

julia
ds = open_dataset(f, driver=:zarr, skip_keys = (:c,))

source

YAXArrays.Datasets.open_mfdataset Method
julia
open_mfdataset(files::DD.DimVector{<:AbstractString}; kwargs...)

Opens and concatenates a list of dataset paths along the dimension specified in files. This method can be used when the generic glob-based version of open_mfdataset fails or is too slow. For example, to concatenate a list of annual NetCDF files along the time dimension, one can use:

julia
files = ["1990.nc","1991.nc","1992.nc"]
open_mfdataset(DD.DimArray(files, YAX.time()))

alternatively, if the dimension to concatenate along does not exist yet, the dimension provided in the input arg is used:

julia
files = ["a.nc", "b.nc", "c.nc"]
open_mfdataset(DD.DimArray(files, DD.Dim{:NewDim}(["a","b","c"])))

source

YAXArrays.Datasets.savecube Method
julia
savecube(cube,name::String)

Save a YAXArray to the path.

Extended Help

The keyword arguments are:

  • name:

  • datasetaxis="Variables" special treatment of a categorical axis that gets written into separate zarr arrays

  • max_cache: The number of bits that are used as cache for the data handling.

  • backend: The backend, that is used to save the data. Falls back to searching the backend according to the extension of the path.

  • driver: The same setting as backend.

  • overwrite::Bool=false overwrite cube if it already exists

source

YAXArrays.Datasets.savedataset Method
julia
savedataset(ds::Dataset; path= "", persist=nothing, overwrite=false, append=false, skeleton=false, backend=:all, driver=backend, max_cache=5e8, writefac=4.0)

Saves a Dataset into a file at path with the format given by driver, i.e., driver=:netcdf or driver=:zarr.

Warning

overwrite=true, deletes ALL your data and it will create a new file.

source

YAXArrays.Datasets.to_dataset Method
julia
to_dataset(c;datasetaxis = "Variables", layername = "layer")

Convert a Data Cube into a Dataset. It is possible to treat one of the Cube's axes as a datasetaxis i.e. the cube will be split into different parts that become variables in the Dataset. If no such axis is specified or found, there will only be a single variable in the dataset with the name layername.

source

Internal API

YAXArrays.YAXDefaults Constant

Default configuration for YAXArrays, has the following fields:

  • workdir[]::String = "./" The default location for temporary cubes.

  • recal[]::Bool = false set to true if you want @loadOrGenerate to always recalculate the results.

  • chunksize[]::Any = :input Set the default output chunksize.

  • max_cache[]::Float64 = 1e8 The maximum cache used by mapCube.

  • cubedir[]::"" the default location for Cube() without an argument.

  • subsetextensions::Array{Any} = [] List of registered functions, that convert subsetting input into dimension boundaries.

source

YAXArrays.findAxis Method
julia
findAxis(desc, c)

Internal function

Extended Help

Given an Axis description and a cube return the index of the Axis.

The Axis description can be:

  • the name as a string or symbol.

  • an Axis object

source

YAXArrays.getOutAxis Method
julia
getOutAxis

source

YAXArrays.get_descriptor Method
julia
get_descriptor(a)

Get the descriptor of an Axis. This is used to dispatch on the descriptor.

source

YAXArrays.match_axis Method
julia
match_axis

Internal function

Extended Help

Match the Axis based on the AxisDescriptor.
This is used to find different axes and to make certain axis description the same.
For example to disregard differences of captialisation.

source

YAXArrays.Cubes.CleanMe Type
julia
mutable struct CleanMe

Struct which describes data paths and their persistency. Non-persistend paths/files are removed at finalize step

source

YAXArrays.Cubes.clean Method
julia
clean(c::CleanMe)

finalizer function for CleanMe struct. The main process removes all directories/files which are not persistent.

source

YAXArrays.Cubes.copydata Method
julia
copydata(outar, inar, copybuf)

Internal function which copies the data from the input inar into the output outar at the copybuf positions.

source

YAXArrays.Cubes.optifunc Method
julia
optifunc(s, maxbuf, incs, outcs, insize, outsize, writefac)

Internal

This function is going to be minimized to detect the best possible chunk setting for the rechunking of the data.

source

YAXArrays.DAT.DATConfig Type

Configuration object of a DAT process. This holds all necessary information to perform the calculations. It contains the following fields:

  • incubes::NTuple{NIN, YAXArrays.DAT.InputCube} where NIN: The input data cubes

  • outcubes::NTuple{NOUT, YAXArrays.DAT.OutputCube} where NOUT: The output data cubes

  • allInAxes::Vector: List of all axes of the input cubes

  • LoopAxes::Vector: List of axes that are looped through

  • ispar::Bool: Flag whether the computation is parallelized

  • loopcachesize::Vector{Int64}:

  • allow_irregular_chunks::Bool:

  • max_cache::Any: Maximal size of the in memory cache

  • fu::Any: Inner function which is computed

  • inplace::Bool: Flag whether the computation happens in place

  • include_loopvars::Bool:

  • ntr::Any:

  • do_gc::Bool: Flag if GC should be called explicitly. Probably necessary for many runs in Julia 1.9

  • addargs::Any: Additional arguments for the inner function

  • kwargs::Any: Additional keyword arguments for the inner function

source

YAXArrays.DAT.InputCube Type

Internal representation of an input cube for DAT operations

  • cube: The input data

  • desc: The input description given by the user/registration

  • axesSmall: List of axes that were actually selected through the description

  • icolon

  • colonperm

  • loopinds: Indices of loop axes that this cube does not contain, i.e. broadcasts

  • cachesize: Number of elements to keep in cache along each axis

  • window

  • iwindow

  • windowloopinds

  • iall

source

YAXArrays.DAT.OutputCube Type

Internal representation of an output cube for DAT operations

Fields

  • cube: The actual outcube cube, once it is generated

  • cube_unpermuted: The unpermuted output cube

  • desc: The description of the output axes as given by users or registration

  • axesSmall: The list of output axes determined through the description

  • allAxes: List of all the axes of the cube

  • loopinds: Index of the loop axes that are broadcasted for this output cube

  • innerchunks

  • outtype: Elementtype of the outputcube

source

YAXArrays.DAT.YAXColumn Type
julia
YAXColumn

A struct representing a single column of a YAXArray partitioned Table # Fields

  • inarBC

  • inds

source

YAXArrays.DAT.cmpcachmisses Method

Function that compares two cache miss specifiers by their importance

source

YAXArrays.DAT.getFrontPerm Method

Calculate an axis permutation that brings the wanted dimensions to the front

source

YAXArrays.DAT.getLoopCacheSize Method

Calculate optimal Cache size to DAT operation

source

YAXArrays.DAT.getOuttype Method
julia
getOuttype(outtype, cdata)

Internal function

Get the element type for the output cube

source

YAXArrays.DAT.getloopchunks Method
julia
getloopchunks(dc::DATConfig)

Internal function

Returns the chunks that can be looped over toghether for all dimensions.
This computation of the size of the chunks is handled by [`DiskArrays.approx_chunksize`](@ref)

source

YAXArrays.DAT.permuteloopaxes Method
julia
permuteloopaxes(dc)

Internal function

Permute the dimensions of the cube, so that the axes that are looped through are in the first positions. This is necessary for a faster looping through the data.

source

YAXArrays.Cubes.setchunks Method
julia
setchunks(c::Dataset,chunks)

Resets the chunks of all or a subset YAXArrays in the dataset and returns a new Dataset. Note that this will not change the chunking of the underlying data itself, it will just make the data "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savedataset on the resulting array. The chunks argument can take one of the following forms:

  • a NamedTuple or AbstractDict mapping from variable name to a description of the desired variable chunks

  • a NamedTuple or AbstractDict mapping from dimension name to a description of the desired variable chunks

  • a description of the desired variable chunks applied to all members of the Dataset

where a description of the desired variable chunks can take one of the following forms:

  • a DiskArrays.GridChunks object

  • a tuple specifying the chunk size along each dimension

  • an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes

source

YAXArrays.Datasets.collectfromhandle Method

Extracts a YAXArray from a dataset handle that was just created from a arrayinfo

source

YAXArrays.Datasets.createdataset Method
julia
function createdataset(DS::Type,axlist; kwargs...)

Creates a new dataset with axes specified in axlist. Each axis must be a subtype of CubeAxis. A new empty Zarr array will be created and can serve as a sink for mapCube operations.

Keyword arguments

  • path="" location where the new cube is stored

  • T=Union{Float32,Missing} data type of the target cube

  • chunksize = ntuple(i->length(axlist[i]),length(axlist)) chunk sizes of the array

  • chunkoffset = ntuple(i->0,length(axlist)) offsets of the chunks

  • persist::Bool=true shall the disk data be garbage-collected when the cube goes out of scope?

  • overwrite::Bool=false overwrite cube if it already exists

  • properties=Dict{String,Any}() additional cube properties

  • globalproperties=Dict{String,Any} global attributes to be added to the dataset

  • fillvalue= T>:Missing ? defaultfillval(Base.nonmissingtype(T)) : nothing fill value

  • datasetaxis="Variables" special treatment of a categorical axis that gets written into separate zarr arrays

  • layername="layer" Fallback name of the variable stored in the dataset if no datasetaxis is found

source

YAXArrays.Datasets.getarrayinfo Method

Extract necessary information to create a YAXArrayBase dataset from a name and YAXArray pair

source

YAXArrays.Datasets.testrange Method

Test if data in x can be approximated by a step range

source