Skip to content

Compute YAXArrays

This section describes how to create new YAXArrays by performing operations on them.

  • Use arithmetics to add or multiply numbers to each element of an array

  • Use map to apply a more complex functions to every element of an array

  • Use mapslices to reduce a dimension, e.g. to get the mean over all time steps

  • Use mapCube to apply complex functions on an array that may change any dimensions

Let's start by creating an example dataset:

julia
using YAXArrays
using Dates

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data, properties)
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

Modify elements of a YAXArray

julia
a[1,2,3]
0.6720458572735374
julia
a[1,2,3] = 42
42
julia
a[1,2,3]
42.0

WARNING

Some arrays, e.g. those saved in a cloud object storage are immutable making any modification of the data impossible.

Arithmetics

Add a value to all elements of an array and save it as a new array:

julia
a2 = a .+ 5
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘
julia
a2[1,2,3] == a[1,2,3] + 5
true

map

Apply a function on every element of an array individually:

julia
offset = 5
map(a) do x
    (x + offset) / 2 * 3
end
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

This keeps all dimensions unchanged. Note, that here we can not access neighboring elements. In this case, we can use mapslices or mapCube instead. Each element of the array is processed individually.

The code runs very fast, because map applies the function lazily. Actual computation will be performed only on demand, e.g. when elements were explicitly requested or further computations were performed.

mapslices

Reduce the time dimension by calculating the average value of all points in time:

julia
import Statistics: mean
mapslices(mean, a, dims="Time")
╭───────────────────────────────────────────╮
│ 10×15 YAXArray{Union{Missing, Float64},2} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 1.17 KB
└──────────────────────────────────────────────────────────────────────────────┘

There is no time dimension left, because there is only one value left after averaging all time steps. We can also calculate spatial means resulting in one value per time step:

julia
mapslices(mean, a, dims=("lat", "lon"))
╭────────────────────────────────────────────────╮
│ 30-element YAXArray{Union{Missing, Float64},1} │
├────────────────────────────────────────────────┴─────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 240.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

mapCube

mapCube is the most flexible way to apply a function over subsets of an array. Dimensions may be added or removed.

Operations over several YAXArrays

Here, we will define a simple function, that will take as input several YAXArrays. But first, let's load the necessary packages.

julia
using YAXArrays, Zarr
using Dates

Define function in space and time

julia
f(lo, la, t) = (lo + la + Dates.dayofyear(t))
f (generic function with 1 method)

now, mapCube requires this function to be wrapped as follows

julia
function g(xout, lo, la, t)
    xout .= f.(lo, la, t)
end
g (generic function with 1 method)

INFO

Note the . after f, this is because we will slice across time, namely, the function is broadcasted along this dimension.

Here, we do create YAXArrays only with the desired dimensions as

julia
julia> lon = YAXArray(Dim{:lon}(range(1, 15)))
╭──────────────────────────────╮
15-element YAXArray{Int64,1}
├──────────────────────────────┴───────────────────────────────────────── dims ┐
lon Sampled{Int64} 1:15 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 120.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘
julia
julia> lat = YAXArray(Dim{:lat}(range(1, 10)))
╭──────────────────────────────╮
10-element YAXArray{Int64,1}
├──────────────────────────────┴───────────────────────────────────────── dims ┐
lat Sampled{Int64} 1:10 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 80.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

And a time Cube's Axis

julia
tspan = Date("2022-01-01"):Day(1):Date("2022-01-30")
time = YAXArray(Dim{:time}(tspan))
╭─────────────────────────────╮
│ 30-element YAXArray{Date,1} │
├─────────────────────────────┴────────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 240.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

note that the following can be extended to arbitrary YAXArrays with additional data and dimensions.

Let's generate a new cube using mapCube and saving the output directly into disk.

julia
julia> gen_cube = mapCube(g, (lon, lat, time);
           indims = (InDims(), InDims(), InDims("time")),
           outdims = OutDims("time", overwrite=true, path="my_gen_cube.zarr", backend=:zarr,
           outtype = Float32)
           # max_cache=1e9
       )
╭──────────────────────────────────────────────╮
30×15×10 YAXArray{Union{Missing, Float32},3}
├──────────────────────────────────────────────┴───────────────────────── dims ┐
time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
lon  Sampled{Int64} 1:15 ForwardOrdered Regular Points,
lat  Sampled{Int64} 1:10 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "missing_value" => 1.0f32
├─────────────────────────────────────────────────────────────── loaded lazily ┤
  data size: 17.58 KB
└──────────────────────────────────────────────────────────────────────────────┘

"time axis goes first"

Note that currently the time axis in the output cube goes first.

Check that it is working

julia
julia> gen_cube.data[1, :, :]
15×10 Matrix{Union{Missing, Float32}}:
  3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0
  4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0
  5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0
  6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0
  7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0
  8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0
  9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0
 10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0
 11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0
 12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0
 13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0
 14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0
 15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0
 16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0
 17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0

but, we can generate a another cube with a different output order as follows

julia
julia> gen_cube = mapCube(g, (lon, lat, time);
           indims = (InDims("lon"), InDims(), InDims()),
           outdims = OutDims("lon", overwrite=true, path="my_gen_cube.zarr", backend=:zarr,
           outtype = Float32)
           # max_cache=1e9
       )
╭──────────────────────────────────────────────╮
15×10×30 YAXArray{Union{Missing, Float32},3}
├──────────────────────────────────────────────┴───────────────────────── dims ┐
lon  Sampled{Int64} 1:15 ForwardOrdered Regular Points,
lat  Sampled{Int64} 1:10 ForwardOrdered Regular Points,
time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "missing_value" => 1.0f32
├─────────────────────────────────────────────────────────────── loaded lazily ┤
  data size: 17.58 KB
└──────────────────────────────────────────────────────────────────────────────┘

INFO

Note that now the broadcasted dimension is lon.

we can see this by slicing on the last dimension now

julia
gen_cube.data[:, :, 1]
15×10 Matrix{Union{Missing, Float32}}:
  3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0
  4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0
  5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0
  6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0
  7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0
  8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0
  9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0
 10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0
 11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0
 12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0
 13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0
 14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0
 15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0
 16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0
 17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0

which outputs the same as the gen_cube.data[1, :, :] called above.

OutDims and YAXArray Properties

Here, we will consider different scenarios, namely how we deal with different input cubes and how to specify the output ones. We will illustrate this with the following test example and the subsequent function definitions.

julia
using YAXArrays, Dates
using Zarr
using Random

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-05")),
    Dim{:lon}(range(1, 4, length=4)),
    Dim{:lat}(range(1, 3, length=3)),
    Dim{:variables}(["a", "b"])
)

Random.seed!(123)
data = rand(1:5, 5, 4, 3, 2)

properties = Dict("description" => "multi dimensional test cube")
yax_test = YAXArray(axlist, data, properties)
╭───────────────────────────╮
│ 5×4×3×2 YAXArray{Int64,4} │
├───────────────────────────┴──────────────────────────────────────────── dims ┐
  ↓ time      Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-05") ForwardOrdered Regular Points,
  → lon       Sampled{Float64} 1.0:1.0:4.0 ForwardOrdered Regular Points,
  ↗ lat       Sampled{Float64} 1.0:1.0:3.0 ForwardOrdered Regular Points,
  ⬔ variables Categorical{String} ["a", "b"] ForwardOrdered
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, String} with 1 entry:
  "description" => "multi dimensional test cube"
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 960.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

One InDims to many OutDims

In the following function, note how the outputs are defined first and the inputs later.

julia
function one_to_many(xout_one, xout_two, xout_flat, xin_one)
    xout_one .= f1.(xin_one)
    xout_two .= f2.(xin_one)
    xout_flat .= sum(xin_one)
    return nothing
end

f1(xin) = xin + 1
f2(xin) = xin + 2
f2 (generic function with 1 method)

now, we define InDims and OutDims:

julia
indims_one   = InDims("Time")
# outputs dimension
properties_one = Dict{String, Any}("name" => "plus_one")
properties_two = Dict{String, Any}("name" => "plus_two")

outdims_one = OutDims("Time"; properties=properties_one)
outdims_two = OutDims("Time"; properties=properties_two)
outdims_flat = OutDims(;) # it will get the default `layer` name if open as dataset
OutDims((), :auto, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}(), false, Array, :input, 1)
julia
ds = mapCube(one_to_many, yax_test,
    indims = indims_one,
    outdims = (outdims_one, outdims_two, outdims_flat));

let's see the second output

julia
ds[2]
╭───────────────────────────────────────────╮
│ 5×4×3×2 YAXArray{Union{Missing, Int64},4} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
  ↓ time      Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-05") ForwardOrdered Regular Points,
  → lon       Sampled{Float64} 1.0:1.0:4.0 ForwardOrdered Regular Points,
  ↗ lat       Sampled{Float64} 1.0:1.0:3.0 ForwardOrdered Regular Points,
  ⬔ variables Categorical{String} ["a", "b"] ForwardOrdered
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "name" => "plus_two"
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 960.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

Many InDims to many OutDims

Let's consider a second test set

julia
properties_2d = Dict("description" => "2d dimensional test cube")
yax_2d = YAXArray(axlist[2:end], rand(-1:1, 4, 3, 2), properties_2d)
╭─────────────────────────╮
│ 4×3×2 YAXArray{Int64,3} │
├─────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ lon       Sampled{Float64} 1.0:1.0:4.0 ForwardOrdered Regular Points,
  → lat       Sampled{Float64} 1.0:1.0:3.0 ForwardOrdered Regular Points,
  ↗ variables Categorical{String} ["a", "b"] ForwardOrdered
├─────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, String} with 1 entry:
  "description" => "2d dimensional test cube"
├─────────────────────────────────────────────────────── loaded in memory ┤
  data size: 192.0 bytes
└─────────────────────────────────────────────────────────────────────────┘

The function definitions operating in this case are as follows

julia
function many_to_many(xout_one, xout_two, xout_flat, xin_one, xin_two, xin_drei)
    xout_one .= f1.(xin_one)
    xout_two .= f2mix.(xin_one, xin_two)
    xout_flat .= sum(xin_drei) # this will reduce the time dimension if we set outdims = OutDims()
    return nothing
end
f2mix(xin_xyt, xin_xy) = xin_xyt - xin_xy
f2mix (generic function with 1 method)

Specify path in OutDims

julia
indims_one   = InDims("Time")
indims_2d   = InDims() # ? it matches only to the other 2 dimensions and uses the same values for each time step
properties = Dict{String, Any}("name"=> "many_to_many_two")
outdims_one = OutDims("Time")
outdims_two = OutDims("Time"; path = "test_mm.zarr", properties)
outdims_flat = OutDims()
OutDims((), :auto, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}(), false, Array, :input, 1)
julia
ds = mapCube(many_to_many, (yax_test, yax_2d, yax_test),
    indims = (indims_one, indims_2d, indims_one),
    outdims = (outdims_one, outdims_two, outdims_flat));

And we can open the one that was saved directly to disk.

julia
ds_mm = open_dataset("test_mm.zarr")
YAXArray Dataset
Shared Axes: 
  (↓ Ti  Sampled{DateTime} [2022-01-01T00:00:00, …, 2022-01-05T00:00:00] ForwardOrdered Irregular Points,
  → lon Sampled{Float64} 1.0:1.0:4.0 ForwardOrdered Regular Points,
  ↗ lat Sampled{Float64} 1.0:1.0:3.0 ForwardOrdered Regular Points)

Variables: 
a, b

Different InDims names

Here, the goal is to operate at the pixel level (longitude, latitude), and then apply the corresponding function to the extracted values. Consider the following toy cubes:

julia
Random.seed!(123)
data = rand(3.0:5.0, 5, 4, 3)

axlist = (Dim{:lon}(1:4), Dim{:lat}(1:3), Dim{:depth}(1:7),)
yax_2d = YAXArray(axlist, rand(-3.0:0.0, 4, 3, 7))
╭───────────────────────────╮
│ 4×3×7 YAXArray{Float64,3} │
├───────────────────────────┴───────────────────────── dims ┐
  ↓ lon   Sampled{Int64} 1:4 ForwardOrdered Regular Points,
  → lat   Sampled{Int64} 1:3 ForwardOrdered Regular Points,
  ↗ depth Sampled{Int64} 1:7 ForwardOrdered Regular Points
├───────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├───────────────────────────────────────── loaded in memory ┤
  data size: 672.0 bytes
└───────────────────────────────────────────────────────────┘

and

julia
Random.seed!(123)
data = rand(3.0:5.0, 5, 4, 3)

axlist = (Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-05")),
    Dim{:lon}(1:4), Dim{:lat}(1:3),)

properties = Dict("description" => "multi dimensional test cube")
yax_test = YAXArray(axlist, data, properties)
╭───────────────────────────╮
│ 5×4×3 YAXArray{Float64,3} │
├───────────────────────────┴──────────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-05") ForwardOrdered Regular Points,
  → lon  Sampled{Int64} 1:4 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Int64} 1:3 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, String} with 1 entry:
  "description" => "multi dimensional test cube"
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 480.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

and the corresponding functions

julia
function mix_time_depth(xin_xyt, xin_xyz)
    s = sum(abs.(xin_xyz))
    return xin_xyt.^2 .+ s
end

function time_depth(xout, xin_one, xin_two)
    xout .= mix_time_depth(xin_one, xin_two)
    # Note also that there is no dot anymore in the function application!
    return nothing
end
time_depth (generic function with 1 method)

with the final mapCube operation as follows

julia
ds = mapCube(time_depth, (yax_test, yax_2d),
    indims = (InDims("Time"), InDims("depth")), # ? anchor dimensions and then map over the others.
    outdims = OutDims("Time"))
╭───────────────────────────────────────────╮
│ 5×4×3 YAXArray{Union{Missing, Float64},3} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-05") ForwardOrdered Regular Points,
  → lon  Sampled{Int64} 1:4 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Int64} 1:3 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 480.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘
  • TODO:
    • Example passing additional arguments to function.

    • MovingWindow

    • Multiple variables outputs, OutDims, in the same YAXArray

Creating a vector array

Here we transform a raster array with spatial dimension lat and lon into a vector array having just one spatial dimension i.e. region. First, create the raster array:

julia
using YAXArrays
using DimensionalData
using Dates

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
raster_arr = YAXArray(axlist, data)
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

Then, create a Matrix with the same spatial dimensions indicating to which region each point belongs to:

julia
regions_mat = map(Iterators.product(raster_arr.lon, raster_arr.lat)) do (lon, lat)
    1 <= lon < 10 && 1 <= lat < 5 && return "A"
    1 <= lon < 10 && 5 <= lat < 10 && return "B"
    10 <= lon < 15 && 1 <= lat < 5 && return "C"
    return "D"
end
regions_mat = DimArray(regions_mat, (raster_arr.lon, raster_arr.lat))
╭──────────────────────────╮
│ 10×15 DimArray{String,2} │
├──────────────────────────┴───────────────────────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
  ↓ →  1.0   1.28571  1.57143  1.85714  …  4.14286  4.42857  4.71429  5.0
  1.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  2.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  3.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  4.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  5.0   "A"   "A"      "A"      "A"     …   "A"      "A"      "A"      "B"
  6.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  7.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  8.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  9.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
 10.0   "C"   "C"      "C"      "C"     …   "C"      "C"      "C"      "D"

which has the same spatial dimensions as the raster array at any given point in time:

julia
DimArray(raster_arr[time = 1])
╭───────────────────────────╮
│ 10×15 DimArray{Float64,2} │
├───────────────────────────┴──────────────────────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
└──────────────────────────────────────────────────────────────────────────────┘
  ↓ →  1.0       1.28571   1.57143    …  4.42857   4.71429   5.0
  1.0  0.17593   0.417937  0.0723492     0.178603  0.781773  0.875658
  2.0  0.701332  0.15394   0.685454      0.372761  0.984803  0.472308
  3.0  0.120997  0.829062  0.684389      0.463503  0.840389  0.536399
  ⋮                                   ⋱                      ⋮
  8.0  0.145747  0.432286  0.465103      0.889583  0.514979  0.671662
  9.0  0.538981  0.497189  0.167676      0.595405  0.752417  0.93986
 10.0  0.824354  0.376135  0.551732   …  0.101524  0.121947  0.508557

Now we calculate the list of corresponding points for each region. This will be re-used for each point in time during the final mapCube. In addition, this avoids the allocation of unnecessary memory.

julia
regions = ["A", "B", "C", "D"]
points_of_regions = map(enumerate(regions)) do (i,region)
    region => findall(isequal(region), regions_mat)
end |> Dict |> sort
OrderedCollections.OrderedDict{String, Vector{CartesianIndex{2}}} with 4 entries:
  "A" => [CartesianIndex(1, 1), CartesianIndex(2, 1), CartesianIndex(3, 1), Car…
  "B" => [CartesianIndex(1, 15), CartesianIndex(2, 15), CartesianIndex(3, 15), …
  "C" => [CartesianIndex(10, 1), CartesianIndex(10, 2), CartesianIndex(10, 3), …
  "D" => [CartesianIndex(10, 15)]

Finally, we can transform the entire raster array:

julia
vector_array = mapCube(
    raster_arr,
    indims=InDims("lon", "lat"),
    outdims=OutDims(Dim{:region}(regions))
) do xout, xin
    for (region_pos, points) in enumerate(points_of_regions.vals)
        # aggregate values of points in the current region at the current date
        xout[region_pos] = sum(view(xin, points))
    end
end
╭──────────────────────────────────────────╮
│ 4×30 YAXArray{Union{Missing, Float64},2} │
├──────────────────────────────────────────┴───────────────────────────── dims ┐
  ↓ region Categorical{String} ["A", "B", "C", "D"] ForwardOrdered,
  → time   Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 960.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

This gives us a vector array with only one spatial dimension, i.e. the region. Note that we still have 30 points in time. The transformation was applied for each date separately.

Hereby, xin is a 10x15 array representing a map at a given time and xout is a 4 element vector of missing values initially representing the 4 regions at that date. Then, we set each output element by the sum of all corresponding points

Distributed Computation

All map methods apply a function on all elements of all non-input dimensions separately. This allows to run each map function call in parallel. For example, we can execute each date of a time series in a different CPU thread during spatial aggregation.

The following code does a time mean over all grid points using multiple CPUs of a local machine:

julia
using YAXArrays
using Dates
using Distributed

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data, properties)

addprocs(2)

@everywhere begin
  using YAXArrays
  using Zarr
  using Statistics
end

@everywhere function mymean(output, pixel)
  @show "doing a mean"
     output[:] .= mean(pixel)
end

mapCube(mymean, a, indims=InDims("time"), outdims=OutDims())

In the last example, mapCube was used to map the mymean function. mapslices is a convenient function that can replace mapCube, where you can omit defining an extra function with the output argument as an input (e.g. mymean). It is possible to simply use mapslice

julia
mapslices(mean  skipmissing, a, dims="time")

It is also possible to distribute easily the workload on a cluster, with little modification to the code. To do so, we use the ClusterManagers package.

julia
using Distributed
using ClusterManagers
addprocs(SlurmManager(10))