Skip to content

Compute YAXArrays

This section describes how to create new YAXArrays by performing operations on them.

  • Use arithmetics to add or multiply numbers to each element of an array

  • Use map to apply a more complex functions to every element of an array

  • Use mapslices to reduce a dimension, e.g. to get the mean over all time steps

  • Use mapCube to apply complex functions on an array that may change any dimensions

Let's start by creating an example dataset:

julia
using YAXArrays
using Dates

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data, properties)
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

Modify elements of a YAXArray

julia
a[1,2,3]
0.3263929594489151
julia
a[1,2,3] = 42
42
julia
a[1,2,3]
42.0

WARNING

Some arrays, e.g. those saved in a cloud object storage are immutable making any modification of the data impossible.

Arithmetics

Add a value to all elements of an array and save it as a new array:

julia
a2 = a .+ 5
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘
julia
a2[1,2,3] == a[1,2,3] + 5
true

map

Apply a function on every element of an array individually:

julia
offset = 5
map(a) do x
    (x + offset) / 2 * 3
end
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{Symbol, String} with 1 entry:
  :origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

This keeps all dimensions unchanged. Note, that here we can not access neighboring elements. In this case, we can use mapslices or mapCube instead. Each element of the array is processed individually.

The code runs very fast, because map applies the function lazily. Actual computation will be performed only on demand, e.g. when elements were explicitly requested or further computations were performed.

mapslices

Reduce the time dimension by calculating the average value of all points in time:

julia
import Statistics: mean
mapslices(mean, a, dims="Time")
╭───────────────────────────────────────────╮
│ 10×15 YAXArray{Union{Missing, Float64},2} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 1.17 KB
└──────────────────────────────────────────────────────────────────────────────┘

There is no time dimension left, because there is only one value left after averaging all time steps. We can also calculate spatial means resulting in one value per time step:

julia
mapslices(mean, a, dims=("lat", "lon"))
╭────────────────────────────────────────────────╮
│ 30-element YAXArray{Union{Missing, Float64},1} │
├────────────────────────────────────────────────┴─────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 240.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

mapCube

mapCube is the most flexible way to apply a function over subsets of an array. Dimensions may be added or removed.

Here we transform a raster array with spatial dimension lat and lon into a vector array having just one spatial dimension i.e. region. First, create the raster array:

julia
using YAXArrays
using DimensionalData
using Dates

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
raster_arr = YAXArray(axlist, data)
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
  → lon  Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  ↗ lat  Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘

Then, create a Matrix with the same spatial dimensions indicating to which region each point belongs to:

julia
regions_mat = map(Iterators.product(raster_arr.lon, raster_arr.lat)) do (lon, lat)
    1 <= lon < 10 && 1 <= lat < 5 && return "A"
    1 <= lon < 10 && 5 <= lat < 10 && return "B"
    10 <= lon < 15 && 1 <= lat < 5 && return "C"
    return "D"
end
regions_mat = DimArray(regions_mat, (raster_arr.lon, raster_arr.lat))
╭──────────────────────────╮
│ 10×15 DimArray{String,2} │
├──────────────────────────┴───────────────────────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
  ↓ →  1.0   1.28571  1.57143  1.85714  …  4.14286  4.42857  4.71429  5.0
  1.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  2.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  3.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  4.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  5.0   "A"   "A"      "A"      "A"     …   "A"      "A"      "A"      "B"
  6.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  7.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  8.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
  9.0   "A"   "A"      "A"      "A"         "A"      "A"      "A"      "B"
 10.0   "C"   "C"      "C"      "C"     …   "C"      "C"      "C"      "D"

which has the same spatial dimensions as the raster array at any given point in time:

julia
DimArray(raster_arr[time = 1])
╭───────────────────────────╮
│ 10×15 DimArray{Float64,2} │
├───────────────────────────┴──────────────────────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
└──────────────────────────────────────────────────────────────────────────────┘
  ↓ →  1.0        1.28571    1.57143   …  4.42857   4.71429    5.0
  1.0  0.966049   0.0270193  0.562734     0.640843  0.268128   0.710486
  2.0  0.0611672  0.240099   0.270024     0.871993  0.794563   0.491742
  3.0  0.662047   0.623191   0.11345      0.161373  0.762443   0.671519
  ⋮                                    ⋱                       ⋮
  8.0  0.908585   0.500348   0.252203     0.656088  0.0717784  0.929396
  9.0  0.257888   0.773843   0.860492     0.703579  0.734724   0.199977
 10.0  0.659735   0.370907   0.20499   …  0.813534  0.53781    0.792245

Now we calculate the list of corresponding points for each region. This will be re-used for each point in time during the final mapCube. In addition, this avoids the allocation of unnecessary memory.

julia
regions = ["A", "B", "C", "D"]
points_of_regions = map(enumerate(regions)) do (i,region)
    region => findall(isequal(region), regions_mat)
end |> Dict |> sort
OrderedCollections.OrderedDict{String, Vector{CartesianIndex{2}}} with 4 entries:
  "A" => [CartesianIndex(1, 1), CartesianIndex(2, 1), CartesianIndex(3, 1), Car…
  "B" => [CartesianIndex(1, 15), CartesianIndex(2, 15), CartesianIndex(3, 15), …
  "C" => [CartesianIndex(10, 1), CartesianIndex(10, 2), CartesianIndex(10, 3), …
  "D" => [CartesianIndex(10, 15)]

Finally, we can transform the entire raster array:

julia
vector_array = mapCube(
    raster_arr,
    indims=InDims("lon", "lat"),
    outdims=OutDims(Dim{:region}(regions))
) do xout, xin
    for (region_pos, points) in enumerate(points_of_regions.vals)
        # aggregate values of points in the current region at the current date
        xout[region_pos] = sum(view(xin, points))
    end
end
╭──────────────────────────────────────────╮
│ 4×30 YAXArray{Union{Missing, Float64},2} │
├──────────────────────────────────────────┴───────────────────────────── dims ┐
  ↓ region Categorical{String} ["A", "B", "C", "D"] ForwardOrdered,
  → time   Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 960.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

This gives us a vector array with only one spatial dimension, i.e. the region. Note that we still have 30 points in time. The transformation was applied for each date separately.

Hereby, xin is a 10x15 array representing a map at a given time and xout is a 4 element vector of missing values initially representing the 4 regions at that date. Then, we set each output element by the sum of all corresponding points

Distributed Computation

All map methods apply a function on all elements of all non-input dimensions separately. This allows to run each map function call in parallel. For example, we can execute each date of a time series in a different CPU thread during spatial aggregation.

The following code does a time mean over all grid points using multiple CPUs of a local machine:

julia
using YAXArrays
using Dates
using Distributed

axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data, properties)

addprocs(2)

@everywhere begin
  using YAXArrays
  using Zarr
  using Statistics
end

@everywhere function mymean(output, pixel)
  @show "doing a mean"
     output[:] .= mean(pixel)
end

mapCube(mymean, a, indims=InDims("time"), outdims=OutDims())

In the last example, mapCube was used to map the mymean function. mapslices is a convenient function that can replace mapCube, where you can omit defining an extra function with the output argument as an input (e.g. mymean). It is possible to simply use mapslice

julia
mapslices(mean  skipmissing, a, dims="time")

It is also possible to distribute easily the workload on a cluster, with little modification to the code. To do so, we use the ClusterManagers package.

julia
using Distributed
using ClusterManagers
addprocs(SlurmManager(10))