Distributed Computing
How to calculate a time mean
using YAXArrays, Statistics, Zarr
using DimensionalData
using Dates
axlist = (
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-10")),
Dim{:lon}(range(1, 10, length=10)),
Dim{:lat}(range(1, 5, length=15)),
Dim{:Variable}(["var1", "var2"])
)
# # And the corresponding data
data = rand(10, 10, 15, 2)
julia> ds = YAXArray(axlist, data)
╭────────────────────────────────╮
│ 10×10×15×2 YAXArray{Float64,4} │
├────────────────────────────────┴─────────────────────────────────────── dims ┐
↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-10") ForwardOrdered Regular Points,
→ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
↗ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points,
⬔ Variable Categorical{String} ["var1", "var2"] ForwardOrdered
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 23.44 KB
└──────────────────────────────────────────────────────────────────────────────┘
c = ds[Variable = At("var1")]
mapslices(mean ∘ skipmissing, c, dims="Time")
╭───────────────────────────────────────────╮
│ 10×15 YAXArray{Union{Missing, Float64},2} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
→ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 1.17 KB
└──────────────────────────────────────────────────────────────────────────────┘
Distributed calculations
It is possible to distribute the calculations over multiple process. The following code does a time mean over all grid points using multiple CPU over a local machine.
using Distributed
addprocs(2)
@everywhere begin
using NetCDF
using YAXArrays
using Statistics
using Zarr
end
@everywhere function mymean(output, pixel)
@show "doing a mean"
output[:] .= mean(pixel)
end
indims = InDims("time")
outdims = OutDims()
resultcube = mapCube(mymean, c, indims=indims, outdims=outdims)
In the last example, mapCube
was used to map the mymean
function. mapslices
is a convenient function that can replace mapCube
, where you can omit defining an extra function with the output argument as an input (e.g. mymean
). It is possible to simply use mapslice
julia> resultcube = mapslices(mean ∘ skipmissing, c, dims="time")
"Running nonthreaded" = "Running nonthreaded"
╭───────────────────────────────────────────╮
│ 10×15 YAXArray{Union{Missing, Float64},2} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
→ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 1.17 KB
└──────────────────────────────────────────────────────────────────────────────┘
It is also possible to distribute easily the workload on a cluster, with little modification to the code. To do so, we use the ClusterManagers
package.
using Distributed
using ClusterManagers
addprocs(SlurmManager(10))