API Reference
This section describes all available functions of this package.
Public API
YAXArrays.getAxis Method
getAxis(desc, c)
Given an Axis description and a cube, returns the corresponding axis of the cube. The Axis description can be:
the name as a string or symbol.
an Axis object
YAXArrays.Cubes Module
The functions provided by YAXArrays are supposed to work on different types of cubes. This module defines the interface for all Data types that
YAXArrays.Cubes.YAXArray Type
YAXArray{T,N}
An array labelled with named axes that have values associated with them. It can wrap normal arrays or, more typically DiskArrays.
Fields
axes
:Tuple
of Dimensions containing the Axes of the Cubedata
: length(axes)-dimensional array which holds the data, this can be a lazy DiskArrayproperties
: Metadata properties describing the content of the datachunks
: Representation of the chunking of the datacleaner
: Cleaner objects to track which objects to tidy up when the YAXArray goes out of scope
YAXArrays.Cubes.concatenatecubes Method
function concatenateCubes(cubelist, cataxis::CategoricalAxis)
Concatenates a vector of datacubes that have identical axes to a new single cube along the new axis cataxis
YAXArrays.Cubes.readcubedata Method
readcubedata(cube)
Given any array implementing the YAXArray interface it returns an in-memory YAXArray
from it.
YAXArrays.Cubes.setchunks Method
setchunks(c::YAXArray,chunks)
Resets the chunks of a YAXArray and returns a new YAXArray. Note that this will not change the chunking of the underlying data itself, it will just make the data "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savecube
on the resulting array. The chunks
argument can take one of the following forms:
a
DiskArrays.GridChunks
objecta tuple specifying the chunk size along each dimension
an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes
YAXArrays.DAT.InDims Type
InDims(axisdesc...;...)
Creates a description of an Input Data Cube for cube operations. Takes a single or multiple axis descriptions as first arguments. Alternatively a MovingWindow(@ref) struct can be passed to include neighbour slices of one or more axes in the computation. Axes can be specified by their name (String), through an Axis type, or by passing a concrete axis.
Keyword arguments
artype
how shall the array be represented in the inner function. Defaults toArray
, alternatives areDataFrame
orAsAxisArray
filter
define some filter to skip the computation, e.g. when all values are missing. Defaults toAllMissing()
, possible values areAnyMissing()
,AnyOcean()
,StdZero()
,NValid(n)
(for at least n non-missing elements). It is also possible to provide a custom one-argument function that takes the array and returnstrue
if the compuation shall be skipped andfalse
otherwise.window_oob_value
if one of the input dimensions is a MowingWindow, this value will be used to fill out-of-bounds areas
YAXArrays.DAT.MovingWindow Type
MovingWindow(desc, pre, after)
Constructs a MovingWindow
object to be passed to an InDims
constructor to define that the axis in desc
shall participate in the inner function (i.e. shall be looped over), but inside the inner function pre
values before and after
values after the center value will be passed as well.
For example passing MovingWindow("Time", 2, 0)
will loop over the time axis and always pass the current time step plus the 2 previous steps. So in the inner function the array will have an additional dimension of size 3.
YAXArrays.DAT.OutDims Method
OutDims(axisdesc;...)
Creates a description of an Output Data Cube for cube operations. Takes a single or a Vector/Tuple of axes as first argument. Axes can be specified by their name (String), through an Axis type, or by passing a concrete axis.
axisdesc
: List of input axis namesbackend
: specifies the dataset backend to write data to, must be either :auto or a key inYAXArrayBase.backendlist
update
: specifies wether the function operates inplace or if an output is returnedartype
: specifies the Array type inside the inner function that is mapped overchunksize
: A Dict specifying the chunksizes for the output dimensions of the cube, or:input
to copy chunksizes from input cube axes or:max
to not chunk the inner dimensionsouttype
: force the output type to a specific type, defaults toAny
which means that the element type of the first input cube is used
YAXArrays.DAT.CubeTable Method
CubeTable()
Function to turn a DataCube object into an iterable table. Takes a list of as arguments, specified as a name=cube
expression. For example CubeTable(data=cube1,country=cube2)
would generate a Table with the entries data
and country
, where data
contains the values of cube1
and country
the values of cube2
. The cubes are matched and broadcasted along their axes like in mapCube
.
YAXArrays.DAT.cubefittable Method
cubefittable(tab,o,fitsym;post=getpostfunction(o),kwargs...)
Executes fittable
on the CubeTable
tab
with the (Weighted-)OnlineStat o
, looping through the values specified by fitsym
. Finally, writes the results from the TableAggregator
to an output data cube.
YAXArrays.DAT.fittable Method
fittable(tab,o,fitsym;by=(),weight=nothing)
Loops through an iterable table tab
and thereby fitting an OnlineStat o
with the values specified through fitsym
. Optionally one can specify a field (or tuple) to group by. Any groupby specifier can either be a symbol denoting the entry to group by or an anynymous function calculating the group from a table row.
For example the following would caluclate a weighted mean over a cube weighted by grid cell area and grouped by country and month:
fittable(iter,WeightedMean,:tair,weight=(i->abs(cosd(i.lat))),by=(i->month(i.time),:country))
YAXArrays.DAT.mapCube Method
mapCube(fun, cube, addargs...;kwargs...)
Map a given function fun
over slices of all cubes of the dataset ds
. Use InDims to discribe the input dimensions and OutDims to describe the output dimensions of the function.
For Datasets, only one output cube can be specified. In contrast to the mapCube function for cubes, additional arguments for the inner function should be set as keyword arguments.
For the specific keyword arguments see the docstring of the mapCube function for cubes.
YAXArrays.DAT.mapCube Method
mapCube(fun, cube, addargs...;kwargs...)
Map a given function fun
over slices of the data cube cube
. The additional arguments addargs
will be forwarded to the inner function fun
. Use InDims to discribe the input dimensions and OutDims to describe the output dimensions of the function.
Keyword arguments
max_cache=YAXDefaults.max_cache
Float64 maximum size of blocks that are read into memory in bits e.g.max_cache=5.0e8
. Or String. e.g.max_cache="10MB"
ormax_cache=1GB
defaults to approx 10Mb.indims::InDims
List of input cube descriptors of typeInDims
for each input data cube.outdims::OutDims
List of output cube descriptors of typeOutDims
for each output cube.inplace
does the function write to an output array inplace or return a single value> defaults totrue
ispar
boolean to determine if parallelisation should be applied, defaults totrue
if workers are available.showprog
boolean indicating if a ProgressMeter shall be showninclude_loopvars
boolean to indicate if the varoables looped over should be added as function argumentsnthreads
number of threads for the computation, defaults to Threads.nthreads for every worker.loopchunksize
determines the chunk sizes of variables which are looped over, a dictkwargs
additional keyword arguments are passed to the inner function
The first argument is always the function to be applied, the second is the input cube or a tuple of input cubes if needed.
YAXArrays.Datasets.Dataset Type
Dataset object which stores an OrderedDict
of YAXArrays with Symbol keys. A dictionary of CubeAxes and a Dictionary of general properties. A dictionary can hold cubes with differing axes. But it will share the common axes between the subcubes.
YAXArrays.Datasets.Dataset Method
Dataset(; properties = Dict{String,Any}, cubes...)
Construct a YAXArray Dataset with global attributes properties
a and a list of named YAXArrays cubes...
YAXArrays.Datasets.Cube Method
Cube(ds::Dataset; joinname="Variables")
Construct a single YAXArray from the dataset ds
by concatenating the cubes in the datset on the joinname
dimension.
YAXArrays.Datasets.open_dataset Method
open_dataset(g; skip_keys=(), driver=:all)
Open the dataset at g
with the given driver
. The default driver will search for available drivers and tries to detect the useable driver from the filename extension.
Keyword arguments
skip_keys
are passed as symbols, i.e.,skip_keys = (:a, :b)
driver=:all
, common options are:netcdf
or:zarr
.
Example:
ds = open_dataset(f, driver=:zarr, skip_keys = (:c,))
YAXArrays.Datasets.open_mfdataset Method
open_mfdataset(files::DD.DimVector{<:AbstractString}; kwargs...)
Opens and concatenates a list of dataset paths along the dimension specified in files
. This method can be used when the generic glob-based version of open_mfdataset fails or is too slow. For example, to concatenate a list of annual NetCDF files along the time
dimension, one can use:
files = ["1990.nc","1991.nc","1992.nc"]
open_mfdataset(DD.DimArray(files, YAX.time()))
alternatively, if the dimension to concatenate along does not exist yet, the dimension provided in the input arg is used:
files = ["a.nc", "b.nc", "c.nc"]
open_mfdataset(DD.DimArray(files, DD.Dim{:NewDim}(["a","b","c"])))
YAXArrays.Datasets.savecube Method
savecube(cube,name::String)
Save a YAXArray
to the path
.
Extended Help
The keyword arguments are:
name
:datasetaxis="Variables"
special treatment of a categorical axis that gets written into separate zarr arraysmax_cache
: The number of bits that are used as cache for the data handling.backend
: The backend, that is used to save the data. Falls back to searching the backend according to the extension of the path.driver
: The same setting asbackend
.overwrite::Bool=false
overwrite cube if it already exists
YAXArrays.Datasets.savedataset Method
savedataset(ds::Dataset; path= "", persist=nothing, overwrite=false, append=false, skeleton=false, backend=:all, driver=backend, max_cache=5e8, writefac=4.0)
Saves a Dataset into a file at path
with the format given by driver
, i.e., driver=:netcdf
or driver=:zarr
.
Warning
overwrite=true
, deletes ALL your data and it will create a new file.
YAXArrays.Datasets.to_dataset Method
to_dataset(c;datasetaxis = "Variables", layername = "layer")
Convert a Data Cube into a Dataset. It is possible to treat one of the Cube's axes as a datasetaxis
i.e. the cube will be split into different parts that become variables in the Dataset. If no such axis is specified or found, there will only be a single variable in the dataset with the name layername
.
Internal API
YAXArrays.YAXDefaults Constant
Default configuration for YAXArrays, has the following fields:
workdir[]::String = "./"
The default location for temporary cubes.recal[]::Bool = false
set to true if you want@loadOrGenerate
to always recalculate the results.chunksize[]::Any = :input
Set the default output chunksize.max_cache[]::Float64 = 1e8
The maximum cache used by mapCube.cubedir[]::""
the default location forCube()
without an argument.subsetextensions::Array{Any} = []
List of registered functions, that convert subsetting input into dimension boundaries.
YAXArrays.findAxis Method
findAxis(desc, c)
Internal function
Extended Help
Given an Axis description and a cube return the index of the Axis.
The Axis description can be:
the name as a string or symbol.
an Axis object
YAXArrays.get_descriptor Method
get_descriptor(a)
Get the descriptor of an Axis. This is used to dispatch on the descriptor.
YAXArrays.match_axis Method
match_axis
Internal function
Extended Help
Match the Axis based on the AxisDescriptor.
This is used to find different axes and to make certain axis description the same.
For example to disregard differences of captialisation.
YAXArrays.Cubes.CleanMe Type
mutable struct CleanMe
Struct which describes data paths and their persistency. Non-persistend paths/files are removed at finalize step
YAXArrays.Cubes.clean Method
clean(c::CleanMe)
finalizer function for CleanMe struct. The main process removes all directories/files which are not persistent.
YAXArrays.Cubes.copydata Method
copydata(outar, inar, copybuf)
Internal function which copies the data from the input inar
into the output outar
at the copybuf
positions.
YAXArrays.Cubes.optifunc Method
optifunc(s, maxbuf, incs, outcs, insize, outsize, writefac)
Internal
This function is going to be minimized to detect the best possible chunk setting for the rechunking of the data.
YAXArrays.DAT.DATConfig Type
Configuration object of a DAT process. This holds all necessary information to perform the calculations. It contains the following fields:
incubes::NTuple{NIN, YAXArrays.DAT.InputCube} where NIN
: The input data cubesoutcubes::NTuple{NOUT, YAXArrays.DAT.OutputCube} where NOUT
: The output data cubesallInAxes::Vector
: List of all axes of the input cubesLoopAxes::Vector
: List of axes that are looped throughispar::Bool
: Flag whether the computation is parallelizedloopcachesize::Vector{Int64}
:allow_irregular_chunks::Bool
:max_cache::Any
: Maximal size of the in memory cachefu::Any
: Inner function which is computedinplace::Bool
: Flag whether the computation happens in placeinclude_loopvars::Bool
:ntr::Any
:do_gc::Bool
: Flag if GC should be called explicitly. Probably necessary for many runs in Julia 1.9addargs::Any
: Additional arguments for the inner functionkwargs::Any
: Additional keyword arguments for the inner function
YAXArrays.DAT.InputCube Type
Internal representation of an input cube for DAT operations
cube
: The input datadesc
: The input description given by the user/registrationaxesSmall
: List of axes that were actually selected through the descriptionicolon
colonperm
loopinds
: Indices of loop axes that this cube does not contain, i.e. broadcastscachesize
: Number of elements to keep in cache along each axiswindow
iwindow
windowloopinds
iall
YAXArrays.DAT.OutputCube Type
Internal representation of an output cube for DAT operations
Fields
cube
: The actual outcube cube, once it is generatedcube_unpermuted
: The unpermuted output cubedesc
: The description of the output axes as given by users or registrationaxesSmall
: The list of output axes determined through the descriptionallAxes
: List of all the axes of the cubeloopinds
: Index of the loop axes that are broadcasted for this output cubeinnerchunks
outtype
: Elementtype of the outputcube
YAXArrays.DAT.YAXColumn Type
YAXColumn
A struct representing a single column of a YAXArray partitioned Table # Fields
inarBC
inds
YAXArrays.DAT.cmpcachmisses Method
Function that compares two cache miss specifiers by their importance
YAXArrays.DAT.getFrontPerm Method
Calculate an axis permutation that brings the wanted dimensions to the front
YAXArrays.DAT.getOuttype Method
getOuttype(outtype, cdata)
Internal function
Get the element type for the output cube
YAXArrays.DAT.getloopchunks Method
getloopchunks(dc::DATConfig)
Internal function
Returns the chunks that can be looped over toghether for all dimensions.
This computation of the size of the chunks is handled by [`DiskArrays.approx_chunksize`](@ref)
YAXArrays.DAT.permuteloopaxes Method
permuteloopaxes(dc)
Internal function
Permute the dimensions of the cube, so that the axes that are looped through are in the first positions. This is necessary for a faster looping through the data.
YAXArrays.Cubes.setchunks Method
setchunks(c::Dataset,chunks)
Resets the chunks of all or a subset YAXArrays in the dataset and returns a new Dataset. Note that this will not change the chunking of the underlying data itself, it will just make the data "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savedataset
on the resulting array. The chunks
argument can take one of the following forms:
a NamedTuple or AbstractDict mapping from variable name to a description of the desired variable chunks
a NamedTuple or AbstractDict mapping from dimension name to a description of the desired variable chunks
a description of the desired variable chunks applied to all members of the Dataset
where a description of the desired variable chunks can take one of the following forms:
a
DiskArrays.GridChunks
objecta tuple specifying the chunk size along each dimension
an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes
YAXArrays.Datasets.collectfromhandle Method
Extracts a YAXArray from a dataset handle that was just created from a arrayinfo
YAXArrays.Datasets.createdataset Method
function createdataset(DS::Type,axlist; kwargs...)
Creates a new dataset with axes specified in axlist
. Each axis must be a subtype of CubeAxis
. A new empty Zarr array will be created and can serve as a sink for mapCube
operations.
Keyword arguments
path=""
location where the new cube is storedT=Union{Float32,Missing}
data type of the target cubechunksize = ntuple(i->length(axlist[i]),length(axlist))
chunk sizes of the arraychunkoffset = ntuple(i->0,length(axlist))
offsets of the chunkspersist::Bool=true
shall the disk data be garbage-collected when the cube goes out of scope?overwrite::Bool=false
overwrite cube if it already existsproperties=Dict{String,Any}()
additional cube propertiesglobalproperties=Dict{String,Any}
global attributes to be added to the datasetfillvalue= T>:Missing ? defaultfillval(Base.nonmissingtype(T)) : nothing
fill valuedatasetaxis="Variables"
special treatment of a categorical axis that gets written into separate zarr arrayslayername="layer"
Fallback name of the variable stored in the dataset if nodatasetaxis
is found
YAXArrays.Datasets.getarrayinfo Method
Extract necessary information to create a YAXArrayBase dataset from a name and YAXArray pair