Skip to content

Types

This section describes the data structures used to work with n-dimensional arrays in YAXArrays.

YAXArray

An Array stores a sequence of ordered elements of the same type usually across multiple dimensions or axes. For example, one can measure temperature across all time points of the time dimension or brightness values of a picture across X and Y dimensions. A one dimensional array is called Vector and a two dimensional array is called a Matrix. In many Machine Learning libraries, arrays are also called tensors. Arrays are designed to store dense spatial-temporal data stored in a grid, whereas a collection of sparse points is usually stored in data frames or relational databases.

A DimArray as defined by DimensionalData.jl adds names to the dimensions and their axes ticks for a given Array. These names can be used to access the data, e.g., by date instead of just by integer position.

A YAXArray is a subtype of a AbstractDimArray and adds functions to load and process the named arrays. For example, it can also handle very large arrays stored on disk that are too big to fit in memory. In addition, it provides functions for parallel computation.

Dataset

A Dataset is an ordered dictionary of YAXArrays that usually share dimensions. For example, it can bundle arrays storing temperature and precipitation that are measured at the same time points and the same locations. One also can store a picture in a Dataset with three arrays containing brightness values for red green and blue, respectively. Internally, those arrays are still separated allowing to chose different element types for each array. Analog to the (NetCDF Data Model)[https://docs.unidata.ucar.edu/netcdf-c/current/netcdf_data_model.html], a Dataset usually represents variables belonging to the same group.

(Data) Cube

A (Data) Cube is just a YAXArray in which arrays from a dataset are combined together by introducing a new dimension containing labels of which array the corresponding element came from. Unlike a Dataset, all arrays must have the same element type to be converted into a cube. This data structure is useful when we want to use all variables at once. For example, the arrays temperature and precipitation which are measured at the same locations and dates can be combined into a single cube. A more formal definition of Data Cubes are given in Mahecha et al. 2020

Dimensions

A Dimension or axis as defined by DimensionalData.jl adds tick labels, e.g., to each row or column of an array. It's name is used to access particular subsets of that array.

Lon, Lat, time

For convenience, several Dimensions have been defined in YAXArrays.jl, but only a few have been exported. The remaining dimensions can be used by calling them explicitly. See the next table for an overview

Dimensionexportedusage: using YAXArrays: YAXArrays as YAX
lonlon or YAX.lon
LonLon or YAX.Lon
longitudelongitude or YAX.longitude
LongitudeLongitude or YAX.Longitude
latlat or YAX.lat
LatLat or YAX.Lat
latitudelatitude or YAX.latitude
LatitudeLatitude or YAX.Latitude
timeYAX.time
TimeYAX.Time
rlatYAX.rlat
rlonYAX.rlon
lat_cYAX.lat_c
lon_cYAX.lon_c
heightYAX.height
depthYAX.depth
VariablesVariables or YAX.Variables

INFO

If the dimension you are looking for is not in that table, you can define your own by doing

julia
using DimensionalData: @dim, XDim # If you want it to be a subtype of XDim
@dim newDim XDim "Your newDim label"

Sometimes, when you want to operate on a specific dimension in your dataset (for example, a dimension named date), then doing

julia
groupby(ds, Dim{:date} => seasons())

should do the job.