Skip to content

Distributed Computing

How to calculate a time mean

julia
using YAXArrays, Statistics, Zarr
using DimensionalData
using Dates
axlist = (
    Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-10")),
    Dim{:lon}(range(1, 10, length=10)),
    Dim{:lat}(range(1, 5, length=15)),
    Dim{:Variable}(["var1", "var2"])
    )
# # And the corresponding data
data = rand(10, 10, 15, 2)
julia
julia> ds = YAXArray(axlist, data)
╭────────────────────────────────╮
10×10×15×2 YAXArray{Float64,4}
├────────────────────────────────┴─────────────────────────────────────── dims ┐
time     Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-10") ForwardOrdered Regular Points,
lon      Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
lat      Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points,
Variable Categorical{String} ["var1", "var2"] ForwardOrdered
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
  file size: 23.44 KB
└──────────────────────────────────────────────────────────────────────────────┘
julia
c = ds[Variable = At("var1")]
mapslices(mean  skipmissing, c, dims="Time")
╭───────────────────────────────────────────╮
│ 10×15 YAXArray{Union{Missing, Float64},2} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
  ↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
  → lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 1.17 KB
└──────────────────────────────────────────────────────────────────────────────┘

Distributed calculations

It is possible to distribute the calculations over multiple process. The following code does a time mean over all grid points using multiple CPU over a local machine.

julia
using Distributed
addprocs(2)
@everywhere begin
  using NetCDF
  using YAXArrays
  using Statistics
  using Zarr
end
@everywhere function mymean(output, pixel)
  @show "doing a mean"
     output[:] .= mean(pixel)
end
indims = InDims("time")
outdims = OutDims()
resultcube = mapCube(mymean, c, indims=indims, outdims=outdims)

In the last example, mapCube was used to map the mymean function. mapslices is a convenient function that can replace mapCube, where you can omit defining an extra function with the output argument as an input (e.g. mymean). It is possible to simply use mapslice

julia
julia> resultcube = mapslices(mean  skipmissing, c, dims="time")
"Running nonthreaded" = "Running nonthreaded"
╭───────────────────────────────────────────╮
10×15 YAXArray{Union{Missing, Float64},2}
├───────────────────────────────────────────┴──────────────────────────── dims ┐
lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
  file size: 1.17 KB
└──────────────────────────────────────────────────────────────────────────────┘

It is also possible to distribute easily the workload on a cluster, with little modification to the code. To do so, we use the ClusterManagers package.

julia
using Distributed
using ClusterManagers
addprocs(SlurmManager(10))