Distributed calculations
Local machine
It is possible to distribute the calculations over multiple process. The following code does a time mean over all grid points using multiple CPU over a local machine.
using Distributed
addprocs(2)
@everywhere using Pkg
@everywhere Pkg.activate(".")
@everywhere using EarthDataLab
@everywhere using Statistics
@everywhere function mymean(output, pixel)
output = mean(pixel)
end
c = Cube()
tair = subsetcube(c,variable="air_temperature_2m", time=2001:2016)
tair_c = map(t->t-273.15, tair)
indims = InDims(TimeAxis)
outdims = OutDims()
resultcube = mapCube(mymean, tair_c, indims=indims, outdims=outdims)
In the last example, mapCube
was used to map the mymean
function. mapslices
is a convenient function that can replace mapCube
, where you can omit defining an extra function with the output argument as an input (e.g. mymean
). It is possible to simply use mapslice
resultcube = mapslices(mean ∘ skipmissing, c, dims="time")
SLURM cluster
It is also possible to distribute easily the workload on a cluster, with little modification to the code. The following code does a time mean over all grid points using multiple CPU over a SLURM cluster. To do so, we use the ClusterManagers
package.
using Distributed
using ClusterManagers
addprocs(SlurmManager(10))
@everywhere using Pkg
@everywhere Pkg.activate(".")
@everywhere using EarthDataLab
@everywhere using Statistics
inpath="zg1000_AERday_CanESM5_esm-hist_r6i1p1f1_gn_18500101-20141231.nc"
c = Cube(inpath, "zg1000")
resultcube = mapslices(mean ∘ skipmissing, c, dims="time")