Distributed calculations
Local machine
It is possible to distribute the calculations over multiple process. The following code does a time mean over all grid points using multiple CPU over a local machine.
using Distributed
addprocs(2)
@everywhere using Pkg
@everywhere Pkg.activate(".")
@everywhere using EarthDataLab
@everywhere using Statistics
@everywhere function mymean(output, pixel)
output = mean(pixel)
end
c = Cube()
tair = subsetcube(c,variable="air_temperature_2m", time=2001:2016)
tair_c = map(t->t-273.15, tair)
indims = InDims(TimeAxis)
outdims = OutDims()
resultcube = mapCube(mymean, tair_c, indims=indims, outdims=outdims)In the last example, mapCube was used to map the mymean function. mapslices is a convenient function that can replace mapCube, where you can omit defining an extra function with the output argument as an input (e.g. mymean). It is possible to simply use mapslice
resultcube = mapslices(mean ∘ skipmissing, c, dims="time")SLURM cluster
It is also possible to distribute easily the workload on a cluster, with little modification to the code. The following code does a time mean over all grid points using multiple CPU over a SLURM cluster. To do so, we use the ClusterManagers package.
using Distributed
using ClusterManagers
addprocs(SlurmManager(10))
@everywhere using Pkg
@everywhere Pkg.activate(".")
@everywhere using EarthDataLab
@everywhere using Statistics
inpath="zg1000_AERday_CanESM5_esm-hist_r6i1p1f1_gn_18500101-20141231.nc"
c = Cube(inpath, "zg1000")
resultcube = mapslices(mean ∘ skipmissing, c, dims="time")