Write YAXArrays and Datasets

Create an example Dataset:

julia

using YAXArrays
using NetCDF
using Downloads: download

path = download("https://www.unidata.ucar.edu/software/netcdf/examples/tos_O1_2001-2002.nc", "example.nc")
ds = open_dataset(path)

YAXArray Dataset
Shared Axes: 
  (↓ lon  Sampled{Float64} 1.0:2.0:359.0 ForwardOrdered Regular Points,
  → lat  Sampled{Float64} -79.5:1.0:89.5 ForwardOrdered Regular Points,
  ↗ time Sampled{CFTime.DateTime360Day} [CFTime.DateTime360Day(2001-01-16T00:00:00), …, CFTime.DateTime360Day(2002-12-16T00:00:00)] ForwardOrdered Irregular Points)

Variables: 
tos

Properties: Dict{String, Any}("cmor_version" => 0.96f0, "references" => "Dufresne et al, Journal of Climate, 2015, vol XX, p 136", "realization" => 1, "Conventions" => "CF-1.0", "contact" => "Sebastien Denvil, sebastien.denvil@ipsl.jussieu.fr", "history" => "YYYY/MM/JJ: data generated; YYYY/MM/JJ+1 data transformed  At 16:37:23 on 01/11/2005, CMOR rewrote data to comply with CF standards and IPCC Fourth Assessment requirements", "table_id" => "Table O1 (13 November 2004)", "source" => "IPSL-CM4_v1 (2003) : atmosphere : LMDZ (IPSL-CM4_IPCC, 96x71x19) ; ocean ORCA2 (ipsl_cm4_v1_8, 2x2L31); sea ice LIM (ipsl_cm4_v", "title" => "IPSL  model output prepared for IPCC Fourth Assessment SRES A2 experiment", "experiment_id" => "SRES A2 experiment"…)

Write Zarr

Save a single YAXArray to a directory:

julia

using Zarr
savecube(ds.tos, "tos.zarr", driver=:zarr)

Save an entire Dataset to a directory:

julia

savedataset(ds, path="ds.zarr", driver=:zarr)

zarr compression

Save a dataset to Zarr format with compression:

julia

n = 9 # compression level, number between 0 (no compression) and 9 (max compression)
compression = Zarr.BloscCompressor(; clevel=n)

savedataset(ds; path="ds_c.zarr", driver=:zarr, compressor=compression)

More on Zarr Compressors. Also, if you use this option and don't notice a significant improvement, please feel free to open an issue or start a discussion.

Write NetCDF

Save a single YAXArray to a directory:

julia

using NetCDF
savecube(ds.tos, "tos.nc", driver=:netcdf)

Save an entire Dataset to a directory:

julia

savedataset(ds, path="ds.nc", driver=:netcdf)

netcdf compression

Save a dataset to NetCDF format with compression:

julia

n = 7 # compression level, number between 0 (no compression) and 9 (max compression)
savedataset(ds, path="ds_c.nc", driver=:netcdf, compress=n)

Comparing it to the default saved file

julia

ds_info = stat("ds.nc")
ds_c_info = stat("ds_c.nc")
println("File size: ", "default: ", ds_info.size, " bytes", ", compress: ", ds_c_info.size, " bytes")

File size: default: 2963860 bytes, compress: 1159916 bytes

Overwrite a Dataset

If a path already exists, an error will be thrown. Set overwrite=true to delete the existing dataset

julia

savedataset(ds, path="ds.zarr", driver=:zarr, overwrite=true)

DANGER

Again, setting overwrite will delete all your previous saved data.

Look at the doc string for more information

YAXArrays.Datasets.savedataset Function

julia

savedataset(ds::Dataset; path= "", persist=nothing, overwrite=false, append=false, skeleton=false, backend=:all, driver=backend, max_cache=5e8, writefac=4.0)

Saves a Dataset into a file at path with the format given by driver, i.e., driver=:netcdf or driver=:zarr.

Warning

overwrite=true, deletes ALL your data and it will create a new file.

source

Append to a Dataset

New variables can be added to an existing dataset using the append=true keyword.

julia

ds2 = Dataset(z = YAXArray(rand(10,20,5)))
savedataset(ds2, path="ds.zarr", backend=:zarr, append=true)

julia

julia> open_dataset("ds.zarr", driver=:zarr)

YAXArray Dataset
Shared Axes:
None
Variables with additional axes:
  Additional Axes: 
  (↓ lon  Sampled{Float64} 1.0:2.0:359.0 ForwardOrdered Regular Points,
  → lat  Sampled{Float64} -79.5:1.0:89.5 ForwardOrdered Regular Points,
  ↗ time Sampled{CFTime.DateTime360Day} [CFTime.DateTime360Day(2001-01-16T00:00:00), …, CFTime.DateTime360Day(2002-12-16T00:00:00)] ForwardOrdered Irregular Points)
  Variables: 
  tos

  Additional Axes: 
  (↓ Dim_1 Sampled{Int64} 1:1:10 ForwardOrdered Regular Points,
  → Dim_2 Sampled{Int64} 1:1:20 ForwardOrdered Regular Points,
  ↗ Dim_3 Sampled{Int64} 1:1:5 ForwardOrdered Regular Points)
  Variables: 
  z

Properties: Dict{String, Any}("cmor_version" => 0.96, "references" => "Dufresne et al, Journal of Climate, 2015, vol XX, p 136", "realization" => 1, "contact" => "Sebastien Denvil, sebastien.denvil@ipsl.jussieu.fr", "Conventions" => "CF-1.0", "history" => "YYYY/MM/JJ: data generated; YYYY/MM/JJ+1 data transformed  At 16:37:23 on 01/11/2005, CMOR rewrote data to comply with CF standards and IPCC Fourth Assessment requirements", "table_id" => "Table O1 (13 November 2004)", "source" => "IPSL-CM4_v1 (2003) : atmosphere : LMDZ (IPSL-CM4_IPCC, 96x71x19) ; ocean ORCA2 (ipsl_cm4_v1_8, 2x2L31); sea ice LIM (ipsl_cm4_v", "title" => "IPSL  model output prepared for IPCC Fourth Assessment SRES A2 experiment", "experiment_id" => "SRES A2 experiment"…)

Save Skeleton

Sometimes one merely wants to create a datacube "Skeleton" on disk and gradually fill it with data. Here we make use of FillArrays to create a YAXArray and write only the axis data and array metadata to disk, while no actual array data is copied:

julia

using YAXArrays, Zarr, FillArrays

create the Zeros array

julia

julia> a = YAXArray(Zeros(Union{Missing, Float32},  5, 4, 5))

┌ 5×4×5 YAXArray{Union{Missing, Float32}, 3} ┐
├────────────────────────────────────────────┴─────────────────────────── dims ┐
  ↓ Dim_1 Sampled{Int64} Base.OneTo(5) ForwardOrdered Regular Points,
  → Dim_2 Sampled{Int64} Base.OneTo(4) ForwardOrdered Regular Points,
  ↗ Dim_3 Sampled{Int64} Base.OneTo(5) ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├──────────────────────────────────────────────────────────── loaded in memory ┤
  data size: 400.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

Now, save to disk with

julia

r = savecube(a, "skeleton.zarr", layername="skeleton", driver=:zarr, skeleton=true, overwrite=true)

WARNING

overwrite=true will delete your previous .zarr file before creating a new one.

Note also that if layername="skeleton" is not provided then the default name for the cube variable will be layer.

Now, we check that all the values are missing

julia

all(ismissing, r[:,:,:])

true

If using FillArrays is not possible, using the zeros function works as well, though it does allocate the array in memory.

INFO

The skeleton argument is also available for savedataset.

Using the toy array defined above we can do

julia

ds = Dataset(skeleton=a) # skeleton will the variable name

YAXArray Dataset
Shared Axes: 
  (↓ Dim_1 Sampled{Int64} Base.OneTo(5) ForwardOrdered Regular Points,
  → Dim_2 Sampled{Int64} Base.OneTo(4) ForwardOrdered Regular Points,
  ↗ Dim_3 Sampled{Int64} Base.OneTo(5) ForwardOrdered Regular Points)

Variables: 
skeleton

julia

ds_s = savedataset(ds, path="skeleton.zarr", driver=:zarr, skeleton=true, overwrite=true)

Update values of `dataset`

Now, we show how to start updating the array values. In order to do it we need to open the dataset first with writing w rights as follows:

julia

ds_open = zopen("skeleton.zarr", "w")
ds_array = ds_open["skeleton"]

ZArray{Float32} of size 5 x 4 x 5

and then we simply update values by indexing them where necessary

julia

ds_array[:,:,1] = rand(Float32, 5, 4) # this will update values directly into disk!

5×4 Matrix{Float32}:
 0.166209  0.167822  0.811004  0.739005
 0.239648  0.971882  0.234606  0.456544
 0.514822  0.383811  0.915756  0.156314
 0.804614  0.729526  0.367598  0.47125
 0.200561  0.408587  0.359569  0.117686

we can verify is this working by loading again directly from disk

julia

ds_open = open_dataset("skeleton.zarr")
ds_array = ds_open["skeleton"]
ds_array.data[:,:,1]

5×4 Matrix{Union{Missing, Float32}}:
 0.166209  0.167822  0.811004  0.739005
 0.239648  0.971882  0.234606  0.456544
 0.514822  0.383811  0.915756  0.156314
 0.804614  0.729526  0.367598  0.47125
 0.200561  0.408587  0.359569  0.117686

indeed, those entries had been updated.

Write YAXArrays and Datasets ​

Write Zarr ​

zarr compression ​

Write NetCDF ​

netcdf compression ​

Overwrite a Dataset ​

Append to a Dataset ​

Save Skeleton ​

Update values of dataset ​

Write YAXArrays and Datasets

Write Zarr

zarr compression

Write NetCDF

netcdf compression

Overwrite a Dataset

Append to a Dataset

Save Skeleton

Update values of `dataset`