Working with arrays¶

Creating an array¶

Zarr has several functions for creating arrays. For example:

import zarr
store = zarr.storage.MemoryStore()
z = zarr.create_array(store=store, shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')
print(z)

<Array memory://128058745895936 shape=(10000, 10000) dtype=int32>

The code above creates a 2-dimensional array of 32-bit integers with 10000 rows and 10000 columns, divided into chunks where each chunk has 1000 rows and 1000 columns (and so there will be 100 chunks in total). The data is written to a zarr.storage.MemoryStore (e.g. an in-memory dict). See Persistent arrays for details on storing arrays in other stores, and see Data types for an in-depth look at the data types supported by Zarr.

See the creation API documentation for more detailed information about creating arrays.

Reading and writing data¶

Zarr arrays support a similar interface to NumPy arrays for reading and writing data. For example, the entire array can be filled with a scalar value:

z[:] = 42

Regions of the array can also be written to, e.g.:

import numpy as np

z[0, :] = np.arange(10000)
z[:, 0] = np.arange(10000)

The contents of the array can be retrieved by slicing, which will load the requested region into memory as a NumPy array, e.g.:

print(z[0, 0])

print(z[-1, -1])

print(z[0, :])

[   0    1    2 ... 9997 9998 9999]

print(z[:, 0])

[   0    1    2 ... 9997 9998 9999]

print(z[:])

[[   0    1    2 ... 9997 9998 9999]
 [   1   42   42 ...   42   42   42]
 [   2   42   42 ...   42   42   42]
 ...
 [9997   42   42 ...   42   42   42]
 [9998   42   42 ...   42   42   42]
 [9999   42   42 ...   42   42   42]]

More information about NumPy-style indexing can be found in the NumPy documentation.

Persistent arrays¶

In the examples above, compressed data for each chunk of the array was stored in main memory. Zarr arrays can also be stored on a file system, enabling persistence of data between sessions. To do this, we can change the store argument to point to a filesystem path:

z1 = zarr.create_array(store='data/example-1.zarr', shape=(10000, 10000), chunks=(1000, 1000), dtype='int32')

The array above will store its configuration metadata and all compressed chunk data in a directory called 'data/example-1.zarr' relative to the current working directory. The zarr.create_array function provides a convenient way to create a new persistent array or continue working with an existing array. Note, there is no need to close an array: data are automatically flushed to disk, and files are automatically closed whenever an array is modified.

Persistent arrays support the same interface for reading and writing data, e.g.:

z1[:] = 42
z1[0, :] = np.arange(10000)
z1[:, 0] = np.arange(10000)

Check that the data have been written and can be read again:

z2 = zarr.open_array('data/example-1.zarr', mode='r')
print(np.all(z1[:] == z2[:]))

True

If you are just looking for a fast and convenient way to save NumPy arrays to disk then load back into memory later, the functions zarr.save and zarr.load may be useful. E.g.:

a = np.arange(10)
zarr.save('data/example-2.zarr', a)
print(zarr.load('data/example-2.zarr'))

[0 1 2 3 4 5 6 7 8 9]

Please note that there are a number of other options for persistent array storage, see the Storage Guide for more details.

Resizing and appending¶

A Zarr array can be resized, which means that any of its dimensions can be increased or decreased in length. For example:

z = zarr.create_array(store='data/example-3.zarr', shape=(10000, 10000), dtype='int32',chunks=(1000, 1000))
z[:] = 42
print(f"Original shape: {z.shape}")
z.resize((20000, 10000))
print(f"New shape: {z.shape}")

Original shape: (10000, 10000)
New shape: (20000, 10000)

Note that when an array is resized, the underlying data are not rearranged in any way. If one or more dimensions are shrunk, any chunks falling outside the new array shape will be deleted from the underlying store.

zarr.Array.append is provided as a convenience function, which can be used to append data to any axis. E.g.:

a = np.arange(10000000, dtype='int32').reshape(10000, 1000)
z = zarr.create_array(store='data/example-4.zarr', shape=a.shape, dtype=a.dtype, chunks=(1000, 100))
z[:] = a
print(f"Original shape: {z.shape}")
z.append(a)
print(f"Shape after first append: {z.shape}")
z.append(np.vstack([a, a]), axis=1)
print(f"Shape after second append: {z.shape}")

Original shape: (10000, 1000)
Shape after first append: (20000, 1000)
Shape after second append: (20000, 2000)

Runtime configuration¶

Zarr arrays are parametrized with a configuration that determines certain aspects of array behavior.

We currently support two configuration options for arrays: write_empty_chunks and order.

field	type	default	description
`write_empty_chunks`	`bool`	`False`	Controls whether empty chunks are written to storage. See Empty chunks.
`order`	`Literal["C", "F"]`	`"C"`	The memory layout of arrays returned when reading data from the store.

You can specify the configuration when you create an array with the config keyword argument. config can be passed as either a dict or an ArrayConfig object.

arr = zarr.create_array({}, shape=(10,), dtype='int8', config={"write_empty_chunks": True})
print(arr.config)

ArrayConfig(order='C', write_empty_chunks=True)

To get an array view with a different config, use the with_config method.

arr_f = arr.with_config({"order": "F"})
print(arr_f.config)

ArrayConfig(order='F', write_empty_chunks=True)

Compressors¶

A number of different compressors can be used with Zarr. Zarr includes Blosc, Zstandard and Gzip compressors. Additional compressors are available through a separate package called NumCodecs which provides various compressor libraries including LZ4, Zlib, BZ2 and LZMA. Different compressors can be provided via the compressors keyword argument accepted by all array creation functions. For example:

compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=3, shuffle=zarr.codecs.BloscShuffle.bitshuffle)
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
z = zarr.create_array(store='data/example-5.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=compressors)
z[:] = data
print(z.compressors)

(BloscCodec(_tunable_attrs=set(), typesize=4, cname=<BloscCname.zstd: 'zstd'>, clevel=3, shuffle=<BloscShuffle.bitshuffle: 'bitshuffle'>, blocksize=0),)

This array above will use Blosc as the primary compressor, using the Zstandard algorithm (compression level 3) internally within Blosc, and with the bit-shuffle filter applied.

When using a compressor, it can be useful to get some diagnostics on the compression ratio. Zarr arrays provide the zarr.Array.info property which can be used to print useful diagnostics, e.g.:

print(z.info)

Type               : Array
Zarr format        : 3
Data type          : Int32(endianness='little')
Fill value         : 0
Shape              : (10000, 10000)
Chunk shape        : (1000, 1000)
Order              : C
Read-only          : False
Store type         : LocalStore
Filters            : ()
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (BloscCodec(_tunable_attrs=set(), typesize=4, cname=<BloscCname.zstd: 'zstd'>, clevel=3, shuffle=<BloscShuffle.bitshuffle: 'bitshuffle'>, blocksize=0),)
No. bytes          : 400000000 (381.5M)

The zarr.Array.info_complete method inspects the underlying store and prints additional diagnostics, e.g.:

print(z.info_complete())

Type               : Array
Zarr format        : 3
Data type          : Int32(endianness='little')
Fill value         : 0
Shape              : (10000, 10000)
Chunk shape        : (1000, 1000)
Order              : C
Read-only          : False
Store type         : LocalStore
Filters            : ()
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (BloscCodec(_tunable_attrs=set(), typesize=4, cname=<BloscCname.zstd: 'zstd'>, clevel=3, shuffle=<BloscShuffle.bitshuffle: 'bitshuffle'>, blocksize=0),)
No. bytes          : 400000000 (381.5M)
No. bytes stored   : 3558573 (3.4M)
Storage ratio      : 112.4
Chunks Initialized : 100

Note

zarr.Array.info_complete will inspect the underlying store and may be slow for large arrays. Use zarr.Array.info if detailed storage statistics are not needed.

If you don't specify a compressor, by default Zarr uses the Zstandard compressor.

To create an array without any compression, set compressors=None:

z_no_compress = zarr.create_array(store='data/example-uncompressed.zarr', shape=(10000, 10000), chunks=(1000, 1000), dtype='int32', compressors=None)
print(f"Compressors: {z_no_compress.compressors}")

Compressors: ()

In addition to Blosc and Zstandard, other compression libraries can also be used. For example, here is an array using Gzip compression, level 1:

data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
z = zarr.create_array(store='data/example-6.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=zarr.codecs.GzipCodec(level=1))
z[:] = data
print(f"Compressors: {z.compressors}")

Compressors: (GzipCodec(level=1),)

Here is an example using LZMA from NumCodecs with a custom filter pipeline including LZMA's built-in delta filter:

import lzma
from zarr.codecs.numcodecs import LZMA

lzma_filters = [dict(id=lzma.FILTER_DELTA, dist=4), dict(id=lzma.FILTER_LZMA2, preset=1)]
compressors = LZMA(filters=lzma_filters)
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
z = zarr.create_array(store='data/example-7.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), compressors=compressors)
print(f"Compressors: {z.compressors}")

Compressors: (LZMA(codec_name='numcodecs.lzma', codec_config={'filters': [{'id': 3, 'dist': 4}, {'id': 33, 'preset': 1}]}),)

To disable compression, set compressors=None when creating an array, e.g.:

z = zarr.create_array(
    store='data/example-8.zarr',
    shape=(100000000,),
    chunks=(1000000,),
    dtype='int32',
    compressors=None
)
print(f"Compressors: {z.compressors}")

Compressors: ()

Filters¶

In some cases, compression can be improved by transforming the data in some way. For example, if nearby values tend to be correlated, then shuffling the bytes within each numerical value or storing the difference between adjacent values may increase compression ratio. Some compressors provide built-in filters that apply transformations to the data prior to compression. For example, the Blosc compressor has built-in implementations of byte- and bit-shuffle filters, and the LZMA compressor has a built-in implementation of a delta filter. However, to provide additional flexibility for implementing and using filters in combination with different compressors, Zarr also provides a mechanism for configuring filters outside of the primary compressor.

Here is an example using a delta filter with the Blosc compressor:

from zarr.codecs.numcodecs import Delta

filters = [Delta(dtype='int32')]
compressors = zarr.codecs.BloscCodec(cname='zstd', clevel=1, shuffle=zarr.codecs.BloscShuffle.shuffle)
data = np.arange(100000000, dtype='int32').reshape(10000, 10000)
z = zarr.create_array(store='data/example-9.zarr', shape=data.shape, dtype=data.dtype, chunks=(1000, 1000), filters=filters, compressors=compressors)
print(z.info_complete())

Type               : Array
Zarr format        : 3
Data type          : Int32(endianness='little')
Fill value         : 0
Shape              : (10000, 10000)
Chunk shape        : (1000, 1000)
Order              : C
Read-only          : False
Store type         : LocalStore
Filters            : (Delta(codec_name='numcodecs.delta', codec_config={'dtype': 'int32'}),)
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (BloscCodec(_tunable_attrs=set(), typesize=4, cname=<BloscCname.zstd: 'zstd'>, clevel=1, shuffle=<BloscShuffle.shuffle: 'shuffle'>, blocksize=0),)
No. bytes          : 400000000 (381.5M)
No. bytes stored   : 826
Storage ratio      : 484261.5
Chunks Initialized : 0

For more information about available filter codecs, see the Numcodecs documentation.

Advanced indexing¶

Zarr arrays support several methods for advanced or "fancy" indexing, which enable a subset of data items to be extracted or updated in an array without loading the entire array into memory.

Note that although this functionality is similar to some of the advanced indexing capabilities available on NumPy arrays and on h5py datasets, the Zarr API for advanced indexing is different from both NumPy and h5py, so please read this section carefully. For a complete description of the indexing API, see the documentation for the zarr.Array class.

Indexing with coordinate arrays¶

Items from a Zarr array can be extracted by providing an integer array of coordinates. E.g.:

data = np.arange(10) ** 2
z = zarr.create_array(store='data/example-10.zarr', shape=data.shape, dtype=data.dtype)
z[:] = data
print(z[:])
print(z.get_coordinate_selection([2, 5]))

[ 0  1  4  9 16 25 36 49 64 81]
[ 4 25]

Coordinate arrays can also be used to update data, e.g.:

z.set_coordinate_selection([2, 5], [-1, -2])
print(z[:])

[ 0  1 -1  9 16 -2 36 49 64 81]

For multidimensional arrays, coordinates must be provided for each dimension, e.g.:

data = np.arange(15).reshape(3, 5)
z = zarr.create_array(store='data/example-11.zarr', shape=data.shape, dtype=data.dtype)
z[:] = data
print(z[:])

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

print(z.get_coordinate_selection(([0, 2], [1, 3])))

[ 1 13]

z.set_coordinate_selection(([0, 2], [1, 3]), [-1, -2])
print(z[:])

[[ 0 -1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 -2 14]]

For convenience, coordinate indexing is also available via the vindex property, as well as the square bracket operator, e.g.:

print(z.vindex[[0, 2], [1, 3]])
z.vindex[[0, 2], [1, 3]] = [-3, -4]

[-1 -2]

print(z[:])

[[ 0 -3  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 -4 14]]

print(z[[0, 2], [1, 3]])

[-3 -4]

When the indexing arrays have different shapes, they are broadcast together. That is, the following two calls are equivalent:

print(z[1, [1, 3]])
print(z[[1, 1], [1, 3]])

[6 8]
[6 8]

Indexing with a mask array¶

Items can also be extracted by providing a Boolean mask. E.g.:

data = np.arange(10) ** 2
z = zarr.create_array(store='data/example-12.zarr', shape=data.shape, dtype=data.dtype)
z[:] = data
print(z[:])

[ 0  1  4  9 16 25 36 49 64 81]

sel = np.zeros_like(z, dtype=bool)
sel[2] = True
sel[5] = True
print(z.get_mask_selection(sel))

[ 4 25]

z.set_mask_selection(sel, [-1, -2])
print(z[:])

[ 0  1 -1  9 16 -2 36 49 64 81]

Here's a multidimensional example:

data = np.arange(15).reshape(3, 5)
z = zarr.create_array(store='data/example-13.zarr', shape=data.shape, dtype=data.dtype)
z[:] = data
print(z[:])

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

sel = np.zeros_like(z, dtype=bool)
sel[0, 1] = True
sel[2, 3] = True
print(z.get_mask_selection(sel))

[ 1 13]

z.set_mask_selection(sel, [-1, -2])
print(z[:])

[[ 0 -1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 -2 14]]

For convenience, mask indexing is also available via the vindex property, e.g.:

print(z.vindex[sel])

[-1 -2]

z.vindex[sel] = [-3, -4]
print(z[:])

[[ 0 -3  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 -4 14]]

Mask indexing is conceptually the same as coordinate indexing, and is implemented internally via the same machinery. Both styles of indexing allow selecting arbitrary items from an array, also known as point selection.

Orthogonal indexing¶

Zarr arrays also support methods for orthogonal indexing, which allows selections to be made along each dimension of an array independently. For example, this allows selecting a subset of rows and/or columns from a 2-dimensional array. E.g.:

data = np.arange(15).reshape(3, 5)
z = zarr.create_array(store='data/example-14.zarr', shape=data.shape, dtype=data.dtype)
z[:] = data
print(z[:])

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

print(z.get_orthogonal_selection(([0, 2], slice(None))))  # select first and third rows

[[ 0  1  2  3  4]
 [10 11 12 13 14]]

print(z.get_orthogonal_selection((slice(None), [1, 3])))  # select second and fourth columns)

[[ 1  3]
 [ 6  8]
 [11 13]]

print(z.get_orthogonal_selection(([0, 2], [1, 3])))  # select rows [0, 2] and columns [1, 4]

[[ 1  3]
 [11 13]]

Data can also be modified, e.g.:

z.set_orthogonal_selection(([0, 2], [1, 3]), [[-1, -2], [-3, -4]])

For convenience, the orthogonal indexing functionality is also available via the oindex property, e.g.:

data = np.arange(15).reshape(3, 5)
z = zarr.create_array(store='data/example-15.zarr', shape=data.shape, dtype=data.dtype)
z[:] = data
print(z.oindex[[0, 2], :])  # select first and third rows

[[ 0  1  2  3  4]
 [10 11 12 13 14]]

print(z.oindex[:, [1, 3]])  # select second and fourth columns

[[ 1  3]
 [ 6  8]
 [11 13]]

print(z.oindex[[0, 2], [1, 3]])  # select rows [0, 2] and columns [1, 4]

[[ 1  3]
 [11 13]]

z.oindex[[0, 2], [1, 3]] = [[-1, -2], [-3, -4]]
print(z[:])

[[ 0 -1  2 -2  4]
 [ 5  6  7  8  9]
 [10 -3 12 -4 14]]

Any combination of integer, slice, 1D integer array and/or 1D Boolean array can be used for orthogonal indexing.

If the index contains at most one iterable, and otherwise contains only slices and integers, orthogonal indexing is also available directly on the array:

data = np.arange(15).reshape(3, 5)
z = zarr.create_array(store='data/example-16.zarr', shape=data.shape, dtype=data.dtype)
z[:] = data
print(np.all(z.oindex[[0, 2], :] == z[[0, 2], :]))

True

Block Indexing¶

Zarr also support block indexing, which allows selections of whole chunks based on their logical indices along each dimension of an array. For example, this allows selecting a subset of chunk aligned rows and/or columns from a 2-dimensional array. E.g.:

data = np.arange(100).reshape(10, 10)
z = zarr.create_array(store='data/example-17.zarr', shape=data.shape, dtype=data.dtype, chunks=(3, 3))
z[:] = data

Retrieve items by specifying their block coordinates:

print(z.get_block_selection(1))

[[30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]]

Equivalent slicing:

print(z[3:6])

[[30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]]

For convenience, the block selection functionality is also available via the blocks property, e.g.:

print(z.blocks[1])

[[30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]]

Block index arrays may be multidimensional to index multidimensional arrays. For example:

print(z.blocks[0, 1:3])

[[ 3  4  5  6  7  8]
 [13 14 15 16 17 18]
 [23 24 25 26 27 28]]

Data can also be modified. Let's start by a simple 2D array:

z = zarr.create_array(store='data/example-18.zarr', shape=(6, 6), dtype=int, chunks=(2, 2))

Set data for a selection of items:

z.set_block_selection((1, 0), 1)
print(z[...])

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [1 1 0 0 0 0]
 [1 1 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]

For convenience, this functionality is also available via the blocks property. E.g.:

z.blocks[:, 2] = 7
print(z[...])

[[0 0 0 0 7 7]
 [0 0 0 0 7 7]
 [1 1 0 0 7 7]
 [1 1 0 0 7 7]
 [0 0 0 0 7 7]
 [0 0 0 0 7 7]]

Any combination of integer and slice can be used for block indexing:

print(z.blocks[2, 1:3])

[[0 0 7 7]
 [0 0 7 7]]

root = zarr.create_group('data/example-19.zarr')
foo = root.create_array(name='foo', shape=(1000, 100), chunks=(10, 10), dtype='float32')
bar = root.create_array(name='bar', shape=(100,), dtype='int32')
foo[:, :] = np.random.random((1000, 100))
bar[:] = np.arange(100)
print(root.tree())

/
├── bar (100,) int32
└── foo (1000, 100) float32

Sharding¶

Using small chunk shapes in very large arrays can lead to a very large number of chunks. This can become a performance issue for file systems and object storage. With Zarr format 3, a new sharding feature has been added to address this issue.

With sharding, multiple chunks can be stored in a single storage object (e.g. a file). Within a shard, chunks are compressed and serialized separately. This allows individual chunks to be read independently. However, when writing data, a full shard must be written in one go for optimal performance and to avoid concurrency issues. That means that shards are the units of writing and chunks are the units of reading. Users need to configure the chunk and shard shapes accordingly.

Sharded arrays can be created by providing the shards parameter to zarr.create_array.

a = zarr.create_array('data/example-20.zarr', shape=(10000, 10000), shards=(1000, 1000), chunks=(100, 100), dtype='uint8')
a[:] = (np.arange(10000 * 10000) % 256).astype('uint8').reshape(10000, 10000)
print(a.info_complete())

Type               : Array
Zarr format        : 3
Data type          : UInt8()
Fill value         : 0
Shape              : (10000, 10000)
Shard shape        : (1000, 1000)
Chunk shape        : (100, 100)
Order              : C
Read-only          : False
Store type         : LocalStore
Filters            : ()
Serializer         : BytesCodec(endian=None)
Compressors        : (ZstdCodec(level=0, checksum=False),)
No. bytes          : 100000000 (95.4M)
No. bytes stored   : 3981473 (3.8M)
Storage ratio      : 25.1
Shards Initialized : 100

In this example a shard shape of (1000, 1000) and a chunk shape of (100, 100) is used. This means that 10*10 chunks are stored in each shard, and there are 10*10 shards in total. Without the shards argument, there would be 10,000 chunks stored as individual files.

Rectilinear (variable) chunk grids¶

Experimental

Rectilinear chunk grids are an experimental feature and may change in future releases. This feature is expected to stabilize in Zarr version 3.3.

Because the feature is still stabilizing, it is disabled by default and must be explicitly enabled:

import zarr
zarr.config.set({"array.rectilinear_chunks": True})

Or via the environment variable ZARR_ARRAY__RECTILINEAR_CHUNKS=True.

The examples below assume this config has been set.

By default, Zarr arrays use a regular chunk grid where every chunk along a given dimension has the same size (except possibly the final boundary chunk). Rectilinear chunk grids allow each chunk along a dimension to have a different size. This is useful when the natural partitioning of the data is not uniform — for example, satellite swaths of varying width, time series with irregular intervals, or spatial tiles of different extents.

Creating arrays with rectilinear chunks¶

To create an array with rectilinear chunks, pass a nested list to the chunks parameter where each inner list gives the chunk sizes along one dimension:

zarr.config.set({"array.rectilinear_chunks": True})
z = zarr.create_array(
    store=zarr.storage.MemoryStore(),
    shape=(60, 100),
    chunks=[[10, 20, 30], [50, 50]],
    dtype='int32',
)
print(z.info)

Type               : Array
Zarr format        : 3
Data type          : Int32(endianness='little')
Fill value         : 0
Shape              : (60, 100)
Chunk shape        : <variable>
Order              : C
Read-only          : False
Store type         : MemoryStore
Filters            : ()
Serializer         : BytesCodec(endian=<Endian.little: 'little'>)
Compressors        : (ZstdCodec(level=0, checksum=False),)
No. bytes          : 24000 (23.4K)

In this example the first dimension is split into three chunks of sizes 10, 20, and 30, while the second dimension is split into two equal chunks of size 50.

Reading and writing data¶

Rectilinear arrays support the same indexing interface as regular arrays. Reads and writes that cross chunk boundaries of different sizes are handled automatically:

import numpy as np
data = np.arange(60 * 100, dtype='int32').reshape(60, 100)
z[:] = data
# Read a slice that spans the first two chunks (sizes 10 and 20) along axis 0
print(z[5:25, 0:5])

[[ 500  501  502  503  504]
 [ 600  601  602  603  604]
 [ 700  701  702  703  704]
 [ 800  801  802  803  804]
 [ 900  901  902  903  904]
 [1000 1001 1002 1003 1004]
 [1100 1101 1102 1103 1104]
 [1200 1201 1202 1203 1204]
 [1300 1301 1302 1303 1304]
 [1400 1401 1402 1403 1404]
 [1500 1501 1502 1503 1504]
 [1600 1601 1602 1603 1604]
 [1700 1701 1702 1703 1704]
 [1800 1801 1802 1803 1804]
 [1900 1901 1902 1903 1904]
 [2000 2001 2002 2003 2004]
 [2100 2101 2102 2103 2104]
 [2200 2201 2202 2203 2204]
 [2300 2301 2302 2303 2304]
 [2400 2401 2402 2403 2404]]

Inspecting chunk sizes¶

The .write_chunk_sizes property returns the actual data size of each storage chunk along every dimension. It works for both regular and rectilinear arrays and returns a tuple of tuples (matching the dask Array.chunks convention). When sharding is used, .read_chunk_sizes returns the inner chunk sizes instead:

print(z.write_chunk_sizes)

((10, 20, 30), (50, 50))

For regular arrays, this includes the boundary chunk:

z_regular = zarr.create_array(
    store=zarr.storage.MemoryStore(),
    shape=(100, 80),
    chunks=(30, 40),
    dtype='int32',
)
print(z_regular.write_chunk_sizes)

((30, 30, 30, 10), (40, 40))

Note that the .chunks property is only available for regular chunk grids. For rectilinear arrays, use .write_chunk_sizes (or .read_chunk_sizes) instead.

Resizing and appending¶

Rectilinear arrays can be resized. When growing past the current edge sum, a new chunk is appended covering the additional extent. When shrinking, the chunk edges are preserved and the extent is re-bound (chunks beyond the new extent simply become inactive):

z = zarr.create_array(
    store=zarr.storage.MemoryStore(),
    shape=(30,),
    chunks=[[10, 20]],
    dtype='float64',
)
z[:] = np.arange(30, dtype='float64')
print(f"Before resize: chunk_sizes={z.write_chunk_sizes}")
z.resize((50,))
print(f"After resize:  chunk_sizes={z.write_chunk_sizes}")

Before resize: chunk_sizes=((10, 20),)
After resize:  chunk_sizes=((10, 20, 20),)

The append method also works with rectilinear arrays:

z.append(np.arange(10, dtype='float64'))
print(f"After append:  shape={z.shape}, chunk_sizes={z.write_chunk_sizes}")

After append:  shape=(60,), chunk_sizes=((10, 20, 20, 10),)

Compressors and filters¶

Rectilinear arrays work with all codecs — compressors, filters, and checksums. Since each chunk may have a different size, the codec pipeline processes each chunk independently:

z = zarr.create_array(
    store=zarr.storage.MemoryStore(),
    shape=(60, 100),
    chunks=[[10, 20, 30], [50, 50]],
    dtype='float64',
    filters=[zarr.codecs.TransposeCodec(order=(1, 0))],
    compressors=[zarr.codecs.BloscCodec(cname='zstd', clevel=3)],
)
z[:] = np.arange(60 * 100, dtype='float64').reshape(60, 100)
np.testing.assert_array_equal(z[:], np.arange(60 * 100, dtype='float64').reshape(60, 100))
print("Roundtrip OK")

Roundtrip OK

Rectilinear shard boundaries¶

Rectilinear chunk grids can also be used for shard boundaries when combined with sharding. In this case, the outer grid (shards) is rectilinear while the inner chunks remain regular. Each shard dimension must be divisible by the corresponding inner chunk size:

z = zarr.create_array(
    store=zarr.storage.MemoryStore(),
    shape=(120, 100),
    chunks=(10, 10),
    shards=[[60, 40, 20], [50, 50]],
    dtype='int32',
)
z[:] = np.arange(120 * 100, dtype='int32').reshape(120, 100)
print(z[50:70, 40:60])

[[5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053
  5054 5055 5056 5057 5058 5059]
 [5140 5141 5142 5143 5144 5145 5146 5147 5148 5149 5150 5151 5152 5153
  5154 5155 5156 5157 5158 5159]
 [5240 5241 5242 5243 5244 5245 5246 5247 5248 5249 5250 5251 5252 5253
  5254 5255 5256 5257 5258 5259]
 [5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353
  5354 5355 5356 5357 5358 5359]
 [5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453
  5454 5455 5456 5457 5458 5459]
 [5540 5541 5542 5543 5544 5545 5546 5547 5548 5549 5550 5551 5552 5553
  5554 5555 5556 5557 5558 5559]
 [5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653
  5654 5655 5656 5657 5658 5659]
 [5740 5741 5742 5743 5744 5745 5746 5747 5748 5749 5750 5751 5752 5753
  5754 5755 5756 5757 5758 5759]
 [5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853
  5854 5855 5856 5857 5858 5859]
 [5940 5941 5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953
  5954 5955 5956 5957 5958 5959]
 [6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053
  6054 6055 6056 6057 6058 6059]
 [6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153
  6154 6155 6156 6157 6158 6159]
 [6240 6241 6242 6243 6244 6245 6246 6247 6248 6249 6250 6251 6252 6253
  6254 6255 6256 6257 6258 6259]
 [6340 6341 6342 6343 6344 6345 6346 6347 6348 6349 6350 6351 6352 6353
  6354 6355 6356 6357 6358 6359]
 [6440 6441 6442 6443 6444 6445 6446 6447 6448 6449 6450 6451 6452 6453
  6454 6455 6456 6457 6458 6459]
 [6540 6541 6542 6543 6544 6545 6546 6547 6548 6549 6550 6551 6552 6553
  6554 6555 6556 6557 6558 6559]
 [6640 6641 6642 6643 6644 6645 6646 6647 6648 6649 6650 6651 6652 6653
  6654 6655 6656 6657 6658 6659]
 [6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753
  6754 6755 6756 6757 6758 6759]
 [6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853
  6854 6855 6856 6857 6858 6859]
 [6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953
  6954 6955 6956 6957 6958 6959]]

Note that rectilinear inner chunks with sharding are not supported — only the shard boundaries can be rectilinear.

Metadata format¶

Rectilinear chunk grid metadata uses run-length encoding (RLE) for compact serialization. When reading metadata, both bare integers and [value, count] pairs are accepted:

[10, 20, 30] — three chunks with explicit sizes
[[10, 3]] — three chunks of size 10 (RLE shorthand)
[[10, 3], 5] — three chunks of size 10, then one chunk of size 5

When writing, Zarr automatically compresses repeated values into RLE format.

Missing features in 3.0¶

The following features have not been ported to 3.0 yet.

Copying and migrating data¶

See the Zarr-Python 2 documentation on Copying and migrating data for more details.