Glossary¶
This page defines key terms used throughout the zarr-python documentation and API.
Array Structure¶
Array¶
An N-dimensional typed array stored in a Zarr store. An array's metadata defines its shape, data type, chunk layout, and codecs.
Chunk¶
The fundamental unit of data in a Zarr array. An array is divided into chunks along each dimension according to the chunk grid, which is currently part of Zarr's private API. Each chunk is independently compressed and encoded through the array's codec pipeline.
When sharding is used, "chunk" refers to the inner chunks within each shard, because those are the compressible units. The chunks are the smallest units that can be read independently.
Convention specific to zarr-python
The use of "chunk" to mean the inner sub-chunk within a shard is a convention
adopted by zarr-python's Array API. In the Zarr V3 specification and in other
Zarr implementations, "chunk" may refer to the top-level grid cells (which
zarr-python calls "shards" when the sharding codec is used). Be aware of this
distinction when working across libraries.
API: Array.chunks returns the chunk shape. When
sharding is used, this is the inner chunk shape.
Chunk Grid¶
The partitioning of an array's elements into chunks. In Zarr V3, the chunk grid is defined in the array metadata and determines the boundaries of each storage object.
When sharding is used, the chunk grid defines the shard boundaries, not the inner chunk boundaries. The inner chunk shape is defined within the sharding codec.
API: The chunk_grid field in array metadata contains the storage-level
grid.
Shard¶
A storage object that contains one or more chunks. Sharding reduces the number of objects in a store by grouping chunks together, which improves performance on file systems and object storage.
Within each shard, chunks are compressed independently and can be read individually. However, writing requires updating the full shard for consistency, making shards the unit of writing and chunks the unit of reading.
Sharding is implemented as a codec (the sharding indexed codec). When sharding is used:
- The chunk grid in metadata defines the shard boundaries
- The sharding codec's
chunk_shapedefines the inner chunk size - Each shard contains
shard_shape / chunk_shapechunks per dimension
API: Array.shards returns the shard shape, or None
if sharding is not used. Array.chunks returns the inner
chunk shape.
Storage¶
Store¶
A key-value storage backend that holds Zarr data and metadata. Stores implement
the zarr.abc.store.Store interface. Examples include local file systems,
cloud object storage (S3, GCS, Azure), zip files, and in-memory dictionaries.
Each chunk or shard is stored as a single value (object or file) in the store, addressed by a key derived from its grid coordinates.
Metadata¶
The JSON document (zarr.json) that describes an array or group. For
arrays, metadata includes the shape, data type, chunk grid, fill
value, and codec pipeline. Metadata is stored alongside the data in
the store. Zarr-Python does not yet expose its internal metadata
representation as part of its public API.
Codecs¶
Codec¶
A transformation applied to array data during reading and writing. Codecs are chained into a pipeline and come in three types:
- Array-to-array: Transforms like transpose that rearrange array elements
- Array-to-bytes: Serialization that converts an array to a byte sequence (exactly one required)
- Bytes-to-bytes: Compression or checksums applied to the serialized bytes
The sharding indexed codec is a special array-to-bytes codec that groups multiple chunks into a single storage object.
API Properties¶
The following properties are available on zarr.Array:
| Property | Description |
|---|---|
.chunks |
Chunk shape — the inner chunk shape when sharding is used |
.shards |
Shard shape, or None if no sharding |
.nchunks |
Total number of independently compressible units across the array |
.cdata_shape |
Number of independently compressible units per dimension |