Skip to content

Compute YAXArrays

This section describes how to create new YAXArrays by performing operations on them.

  • Use arithmetics to add or multiply numbers to each element of an array

  • Use map to apply a more complex functions to every element of an array

  • Use mapslices to reduce a dimension, e.g. to get the mean over all time steps

  • Use mapCube to apply complex functions on an array that may change any dimensions

Let's start by creating an example dataset:

julia
using YAXArrays
using Dates

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data, properties)
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

Modify elements of a YAXArray

julia
a[1,2,3]
0.34093180898718256
julia
a[1,2,3] = 42
42
julia
a[1,2,3]
42.0

WARNING

Some arrays, e.g. those saved in a cloud object storage are immutable making any modification of the data impossible.

Arithmetics

Add a value to all elements of an array and save it as a new array:

julia
a2 = a .+ 5
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘
julia
a2[1,2,3] == a[1,2,3] + 5
true

map

Apply a function on every element of an array individually:

julia
offset = 5
map(a) do x
    (x + offset) / 2 * 3
end
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

This keeps all dimensions unchanged. Note, that here we can not access neighboring elements. In this case, we can use mapslices or mapCube instead. Each element of the array is processed individually.

The code runs very fast, because map applies the function lazily. Actual computation will be performed only on demand, e.g. when elements were explicitly requested or further computations were performed.

mapslices

Reduce the time dimension by calculating the average value of all points in time:

julia
import Statistics: mean
mapslices(mean, a, dims="Time")
╭───────────────────────────────────────────╮
│ 10×15 YAXArray{Union{Missing, Float64},2} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 1.17 KB
└──────────────────────────────────────────────────────────────────────────────┘

There is no time dimension left, because there is only one value left after averaging all time steps. We can also calculate spatial means resulting in one value per time step:

julia
mapslices(mean, a, dims=("lat", "lon"))
╭────────────────────────────────────────────────╮
│ 30-element YAXArray{Union{Missing, Float64},1} │
├────────────────────────────────────────────────┴─────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 240.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

mapCube

mapCube is the most flexible way to apply a function over subsets of an array. Dimensions may be added or removed.

Operations over several YAXArrays

Here, we will define a simple function, that will take as input several YAXArrays. But first, let's load the necessary packages.

julia
using YAXArrays, Zarr
using Dates

Define function in space and time

julia
f(lo, la, t) = (lo + la + Dates.dayofyear(t))
f (generic function with 1 method)

now, mapCube requires this function to be wrapped as follows

julia
function g(xout, lo, la, t)
    xout .= f.(lo, la, t)
end
g (generic function with 1 method)

INFO

Note the . after f, this is because we will slice across time, namely, the function is broadcasted along this dimension.

Here, we do create YAXArrays only with the desired dimensions as

julia
julia> lon = YAXArray(Dim{:lon}(range(1, 15)))
╭──────────────────────────────╮
15-element YAXArray{Int64,1}
├──────────────────────────────┴───────────────────────────────────────── dims ┐
lon Sampled{Int64} 1:15 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
  file size: 120.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘
julia
julia> lat = YAXArray(Dim{:lat}(range(1, 10)))
╭──────────────────────────────╮
10-element YAXArray{Int64,1}
├──────────────────────────────┴───────────────────────────────────────── dims ┐
lat Sampled{Int64} 1:10 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
  file size: 80.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

And a time Cube's Axis

julia
tspan = Date("2022-01-01"):Day(1):Date("2022-01-30")
time = YAXArray(Dim{:time}(tspan))
╭─────────────────────────────╮
│ 30-element YAXArray{Date,1} │
├─────────────────────────────┴────────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 240.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

note that the following can be extended to arbitrary YAXArrays with additional data and dimensions.

Let's generate a new cube using mapCube and saving the output directly into disk.

julia
julia> gen_cube = mapCube(g, (lon, lat, time);
           indims = (InDims(), InDims(), InDims("time")),
           outdims = OutDims("time", overwrite=true, path="my_gen_cube.zarr", backend=:zarr,
           outtype = Float32)
           # max_cache=1e9
       )
╭──────────────────────────────────────────────╮
30×15×10 YAXArray{Union{Missing, Float32},3}
├──────────────────────────────────────────────┴───────────────────────── dims ┐
time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
lon  Sampled{Int64} 1:15 ForwardOrdered Regular Points,
lat  Sampled{Int64} 1:10 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "missing_value" => 1.0f32
├─────────────────────────────────────────────────────────────────── file size ┤
  file size: 17.58 KB
└──────────────────────────────────────────────────────────────────────────────┘

"time axis goes first"

Note that currently the time axis in the output cube goes first.

Check that it is working

julia
julia> gen_cube.data[1, :, :]
15×10 Matrix{Union{Missing, Float32}}:
  3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0
  4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0
  5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0
  6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0
  7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0
  8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0
  9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0
 10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0
 11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0
 12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0
 13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0
 14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0
 15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0
 16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0
 17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0

but, we can generate a another cube with a different output order as follows

julia
julia> gen_cube = mapCube(g, (lon, lat, time);
           indims = (InDims("lon"), InDims(), InDims()),
           outdims = OutDims("lon", overwrite=true, path="my_gen_cube.zarr", backend=:zarr,
           outtype = Float32)
           # max_cache=1e9
       )
╭──────────────────────────────────────────────╮
15×10×30 YAXArray{Union{Missing, Float32},3}
├──────────────────────────────────────────────┴───────────────────────── dims ┐
lon  Sampled{Int64} 1:15 ForwardOrdered Regular Points,
lat  Sampled{Int64} 1:10 ForwardOrdered Regular Points,
time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any} with 1 entry:
  "missing_value" => 1.0f32
├─────────────────────────────────────────────────────────────────── file size ┤
  file size: 17.58 KB
└──────────────────────────────────────────────────────────────────────────────┘

INFO

Note that now the broadcasted dimension is lon.

we can see this by slicing on the last dimension now

julia
gen_cube.data[:, :, 1]
15×10 Matrix{Union{Missing, Float32}}:
  3.0   4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0
  4.0   5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0
  5.0   6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0
  6.0   7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0
  7.0   8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0
  8.0   9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0
  9.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0
 10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0
 11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0
 12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0
 13.0  14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0
 14.0  15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0
 15.0  16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0
 16.0  17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0
 17.0  18.0  19.0  20.0  21.0  22.0  23.0  24.0  25.0  26.0

which outputs the same as the gen_cube.data[1, :, :] called above.

Creating a vector array

Here we transform a raster array with spatial dimension lat and lon into a vector array having just one spatial dimension i.e. region. First, create the raster array:

julia
using YAXArrays
using DimensionalData
using Dates

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
raster_arr = YAXArray(axlist, data)
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

Then, create a Matrix with the same spatial dimensions indicating to which region each point belongs to:

julia
regions_mat = map(Iterators.product(raster_arr.lon, raster_arr.lat)) do (lon, lat)
    1 <= lon < 10 && 1 <= lat < 5 && return "A"
    1 <= lon < 10 && 5 <= lat < 10 && return "B"
    10 <= lon < 15 && 1 <= lat < 5 && return "C"
    return "D"
end
regions_mat = DimArray(regions_mat, (raster_arr.lon, raster_arr.lat))
╭──────────────────────────╮
│ 10×15 DimArray{String,2} │
├──────────────────────────┴───────────────────────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
  ↓ →  1.0   1.28571  1.57143  1.85714  …  4.14286  4.42857  4.71429  5.0
  1.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  2.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  3.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  4.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  5.0   "A"   "A"      "A"      "A"     …   "A"      "A"      "A"      "B"
  6.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  7.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  8.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  9.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
 10.0   "C"   "C"      "C"      "C"     …   "C"      "C"      "C"      "D"

which has the same spatial dimensions as the raster array at any given point in time:

julia
DimArray(raster_arr[time = 1])
╭───────────────────────────╮
│ 10×15 DimArray{Float64,2} │
├───────────────────────────┴──────────────────────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
└──────────────────────────────────────────────────────────────────────────────┘
  ↓ →  1.0       1.28571    1.57143   …  4.42857   4.71429   5.0
  1.0  0.423959  0.495718   0.126617     0.16858   0.454909  0.338821
  2.0  0.609098  0.140488   0.729186     0.984254  0.924728  0.837976
  3.0  0.710902  0.214893   0.702405     0.953086  0.35839   0.796689
  ⋮                                   ⋱                      ⋮
  8.0  0.275995  0.0626282  0.180195     0.175184  0.449547  0.703896
  9.0  0.902076  0.655641   0.964787     0.025293  0.104898  0.579358
 10.0  0.285525  0.813113   0.104357  …  0.054352  0.222979  0.944919

Now we calculate the list of corresponding points for each region. This will be re-used for each point in time during the final mapCube. In addition, this avoids the allocation of unnecessary memory.

julia
regions = ["A", "B", "C", "D"]
points_of_regions = map(enumerate(regions)) do (i,region)
    region => findall(isequal(region), regions_mat)
end |> Dict |> sort
OrderedCollections.OrderedDict{String, Vector{CartesianIndex{2}}} with 4 entries:
  "A" => [CartesianIndex(1, 1), CartesianIndex(2, 1), CartesianIndex(3, 1), Car…
  "B" => [CartesianIndex(1, 15), CartesianIndex(2, 15), CartesianIndex(3, 15), …
  "C" => [CartesianIndex(10, 1), CartesianIndex(10, 2), CartesianIndex(10, 3), …
  "D" => [CartesianIndex(10, 15)]

Finally, we can transform the entire raster array:

julia
vector_array = mapCube(
    raster_arr,
    indims=InDims("lon", "lat"),
    outdims=OutDims(Dim{:region}(regions))
) do xout, xin
    for (region_pos, points) in enumerate(points_of_regions.vals)
        # aggregate values of points in the current region at the current date
        xout[region_pos] = sum(view(xin, points))
    end
end
╭──────────────────────────────────────────╮
│ 4×30 YAXArray{Union{Missing, Float64},2} │
├──────────────────────────────────────────┴───────────────────────────── dims ┐
  ↓ region Categorical{String} ["A", "B", "C", "D"] ForwardOrdered,
  → time   Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 960.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

This gives us a vector array with only one spatial dimension, i.e. the region. Note that we still have 30 points in time. The transformation was applied for each date separately.

Hereby, xin is a 10x15 array representing a map at a given time and xout is a 4 element vector of missing values initially representing the 4 regions at that date. Then, we set each output element by the sum of all corresponding points

Distributed Computation

All map methods apply a function on all elements of all non-input dimensions separately. This allows to run each map function call in parallel. For example, we can execute each date of a time series in a different CPU thread during spatial aggregation.

The following code does a time mean over all grid points using multiple CPUs of a local machine:

julia
using YAXArrays
using Dates
using Distributed

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data, properties)

addprocs(2)

@everywhere begin
  using YAXArrays
  using Zarr
  using Statistics
end

@everywhere function mymean(output, pixel)
  @show "doing a mean"
     output[:] .= mean(pixel)
end

mapCube(mymean, a, indims=InDims("time"), outdims=OutDims())

In the last example, mapCube was used to map the mymean function. mapslices is a convenient function that can replace mapCube, where you can omit defining an extra function with the output argument as an input (e.g. mymean). It is possible to simply use mapslice

julia
mapslices(mean  skipmissing, a, dims="time")

It is also possible to distribute easily the workload on a cluster, with little modification to the code. To do so, we use the ClusterManagers package.

julia
using Distributed
using ClusterManagers
addprocs(SlurmManager(10))