Write YAXArrays and Datasets
Create an example Dataset:
using YAXArrays
using NetCDF
using Downloads: download
path = download("https://archive.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc", "example.nc")
ds = open_dataset(path)YAXArray Dataset
Shared Axes:
(↓ lon Sampled{Float64} 1.0:2.0:359.0 ForwardOrdered Regular Points,
→ lat Sampled{Float64} -79.5:1.0:89.5 ForwardOrdered Regular Points,
↗ time Sampled{CFTime.DateTime360Day{CFTime.Period{Float64, Val{86400}(), Val{0}()}, Val{(2001, 1, 1)}()}} [DateTime360Day(2001-01-16T00:00:00), …, DateTime360Day(2002-12-16T00:00:00)] ForwardOrdered Irregular Points)
Variables:
tos
Properties: Dict{String, Any}("cmor_version" => 0.96f0, "references" => "Dufresne et al, Journal of Climate, 2015, vol XX, p 136", "realization" => 1, "Conventions" => "CF-1.0", "contact" => "Sebastien Denvil, sebastien.denvil@ipsl.jussieu.fr", "history" => "YYYY/MM/JJ: data generated; YYYY/MM/JJ+1 data transformed At 16:37:23 on 01/11/2005, CMOR rewrote data to comply with CF standards and IPCC Fourth Assessment requirements", "table_id" => "Table O1 (13 November 2004)", "source" => "IPSL-CM4_v1 (2003) : atmosphere : LMDZ (IPSL-CM4_IPCC, 96x71x19) ; ocean ORCA2 (ipsl_cm4_v1_8, 2x2L31); sea ice LIM (ipsl_cm4_v", "title" => "IPSL model output prepared for IPCC Fourth Assessment SRES A2 experiment", "experiment_id" => "SRES A2 experiment"…)Write Zarr
Save a single YAXArray to a directory:
using Zarr
savecube(ds.tos, "tos.zarr", driver=:zarr)Save an entire Dataset to a directory:
savedataset(ds, path="ds.zarr", driver=:zarr)zarr compression
Save a dataset to Zarr format with compression:
n = 9 # compression level, number between 0 (no compression) and 9 (max compression)
compression = Zarr.BloscCompressor(; clevel=n)
savedataset(ds; path="ds_c.zarr", driver=:zarr, compressor=compression)More on Zarr Compressors. Also, if you use this option and don't notice a significant improvement, please feel free to open an issue or start a discussion.
Write to cloud buckets
Writing directly to S3-compatible cloud object storage is supported. Valid credentials must be given. Providing environmental variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY is highly recommended for username and password, respectively. One needs to create any AbstractAWSConfig and activate it with AWS.global_aws_config, e. g. using MinIO for self-hosted storage:
using AWS
using Minio
# assume env vars AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are available
minio_config = MinioConfig("https://s3.example.com:9000")
AWS.global_aws_config(minio_config)
savedataset(ds; path="s3://my_bucket/my_object", driver=:zarr)Note that arguments path and driver can also be used to create OutDims in mapCube, enabling writing results of a computation directly to cloud object storage.
Write NetCDF
Save a single YAXArray to a directory:
using NetCDF
savecube(ds.tos, "tos.nc", driver=:netcdf)Save an entire Dataset to a directory:
savedataset(ds, path="ds.nc", driver=:netcdf)netcdf compression
Save a dataset to NetCDF format with compression:
n = 7 # compression level, number between 0 (no compression) and 9 (max compression)
savedataset(ds, path="ds_c.nc", driver=:netcdf, compress=n)Comparing it to the default saved file
ds_info = stat("ds.nc")
ds_c_info = stat("ds_c.nc")
println("File size: ", "default: ", ds_info.size, " bytes", ", compress: ", ds_c_info.size, " bytes")File size: default: 2963860 bytes, compress: 1159916 bytesOverwrite a Dataset
If a path already exists, an error will be thrown. Set overwrite=true to delete the existing dataset
savedataset(ds, path="ds.zarr", driver=:zarr, overwrite=true)DANGER
Again, setting overwrite will delete all your previous saved data.
Look at the doc string for more information
YAXArrays.Datasets.savedataset Function
savedataset(ds::Dataset; path= "", persist=nothing, overwrite=false, append=false, skeleton=false, backend=:all, driver=backend, max_cache=5e8, writefac=4.0)Saves a Dataset into a file at path with the format given by driver, i.e., driver=:netcdf or driver=:zarr.
Warning
overwrite=true, deletes ALL your data and it will create a new file.
Append to a Dataset
New variables can be added to an existing dataset using the append=true keyword.
ds2 = Dataset(z = YAXArray(rand(10,20,5)))
savedataset(ds2, path="ds.zarr", backend=:zarr, append=true)julia> open_dataset("ds.zarr", driver=:zarr)YAXArray Dataset
Shared Axes:
None
Variables with additional axes:
Additional Axes:
(↓ lon Sampled{Float64} 1.0:2.0:359.0 ForwardOrdered Regular Points,
→ lat Sampled{Float64} -79.5:1.0:89.5 ForwardOrdered Regular Points,
↗ time Sampled{CFTime.DateTime360Day{CFTime.Period{Float64, Val{86400}(), Val{0}()}, Val{(1980, 1, 1)}()}} [DateTime360Day(2001-01-16T00:00:00), …, DateTime360Day(2002-12-16T00:00:00)] ForwardOrdered Irregular Points)
Variables:
tos
Additional Axes:
(↓ Dim_1 Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
→ Dim_2 Sampled{Int64} 1:1:20 ForwardOrdered Regular Points,
↗ Dim_3 Sampled{Int64} 1:1:5 ForwardOrdered Regular Points)
Variables:
z
Properties: Dict{String, Any}("cmor_version" => 0.96, "references" => "Dufresne et al, Journal of Climate, 2015, vol XX, p 136", "realization" => 1, "contact" => "Sebastien Denvil, sebastien.denvil@ipsl.jussieu.fr", "Conventions" => "CF-1.0", "history" => "YYYY/MM/JJ: data generated; YYYY/MM/JJ+1 data transformed At 16:37:23 on 01/11/2005, CMOR rewrote data to comply with CF standards and IPCC Fourth Assessment requirements", "table_id" => "Table O1 (13 November 2004)", "source" => "IPSL-CM4_v1 (2003) : atmosphere : LMDZ (IPSL-CM4_IPCC, 96x71x19) ; ocean ORCA2 (ipsl_cm4_v1_8, 2x2L31); sea ice LIM (ipsl_cm4_v", "title" => "IPSL model output prepared for IPCC Fourth Assessment SRES A2 experiment", "experiment_id" => "SRES A2 experiment"…)Save Skeleton
Sometimes one merely wants to create a datacube "Skeleton" on disk and gradually fill it with data. Here we make use of FillArrays to create a YAXArray and write only the axis data and array metadata to disk, while no actual array data is copied:
using YAXArrays, Zarr, FillArrayscreate the Zeros array
julia> a = YAXArray(Zeros(Union{Missing, Float32}, 5, 4, 5))┌ 5×4×5 YAXArray{Union{Missing, Float32}, 3} ┐
├────────────────────────────────────────────┴─────────────────── dims ┐
↓ Dim_1 Sampled{Int64} Base.OneTo(5) ForwardOrdered Regular Points,
→ Dim_2 Sampled{Int64} Base.OneTo(4) ForwardOrdered Regular Points,
↗ Dim_3 Sampled{Int64} Base.OneTo(5) ForwardOrdered Regular Points
├──────────────────────────────────────────────────── loaded in memory ┤
data size: 400.0 bytes
└──────────────────────────────────────────────────────────────────────┘Now, save to disk with
r = savecube(a, "skeleton.zarr", layername="skeleton", driver=:zarr, skeleton=true, overwrite=true)WARNING
overwrite=true will delete your previous .zarr file before creating a new one.
Note also that if layername="skeleton" is not provided then the default name for the cube variable will be layer.
Now, we check that all the values are missing
all(ismissing, r[:,:,:])trueIf using FillArrays is not possible, using the zeros function works as well, though it does allocate the array in memory.
INFO
The skeleton argument is also available for savedataset.
Using the toy array defined above we can do
ds = Dataset(skeleton=a) # skeleton will the variable nameYAXArray Dataset
Shared Axes:
(↓ Dim_1 Sampled{Int64} Base.OneTo(5) ForwardOrdered Regular Points,
→ Dim_2 Sampled{Int64} Base.OneTo(4) ForwardOrdered Regular Points,
↗ Dim_3 Sampled{Int64} Base.OneTo(5) ForwardOrdered Regular Points)
Variables:
skeletonds_s = savedataset(ds, path="skeleton.zarr", driver=:zarr, skeleton=true, overwrite=true)Update values of dataset
Now, we show how to start updating the array values. In order to do it we need to open the dataset first with writing w rights as follows:
ds_open = zopen("skeleton.zarr", "w")
ds_array = ds_open["skeleton"]ZArray{Float32} of size 5 x 4 x 5and then we simply update values by indexing them where necessary
ds_array[:,:,1] = rand(Float32, 5, 4) # this will update values directly into disk!5×4 Matrix{Float32}:
0.166209 0.167822 0.811004 0.739005
0.239648 0.971882 0.234606 0.456544
0.514822 0.383811 0.915756 0.156314
0.804614 0.729526 0.367598 0.47125
0.200561 0.408587 0.359569 0.117686we can verify is this working by loading again directly from disk
ds_open = open_dataset("skeleton.zarr")
ds_array = ds_open["skeleton"]
ds_array.data[:,:,1]5×4 Matrix{Union{Missing, Float32}}:
0.166209 0.167822 0.811004 0.739005
0.239648 0.971882 0.234606 0.456544
0.514822 0.383811 0.915756 0.156314
0.804614 0.729526 0.367598 0.47125
0.200561 0.408587 0.359569 0.117686indeed, those entries had been updated.