Byte2Bitâ„¢ Atlas
Storage optimization, scientific data pipelines, and Python workflows.
Storage optimization, scientific data pipelines, and Python workflows.
Goal: make large scientific and sensor archives smaller without forcing teams to abandon existing Python and Zarr-style workflows.
Atlas supports common integer and real numeric types, interleaved complex values, fixed-point decimals, byte arrays, and raw byte payloads.
The unified b2b.transform(...) API auto-detects common scientific formats and writes Byte2Bit-compressed Zarr output.
compression_level=4 and quantize_scale=0 for lossless workflows unless a project explicitly chooses another mode.
Lossless levels described in the product demo: 0, 1, 2, and 4. Error-bounded compression is available for real-valued data with level 1.
Use a clean Python environment, install the wheel plus scientific format dependencies, then confirm that byte2bitZarr imports correctly.
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install --force-reinstall /dist/byte2bitzarr-...whl
export BYTE2BIT_LICENSE_PATH=/dist/COMPANYNAME-0001.license.json
python -m pip install xarray cfgrib eccodes h5py netcdf4 h5netcdf rasterio
python -c "import byte2bitZarr as b2b; print('OK', b2b.__name__)"
For in-memory workflows, encode to bytes and decode into a pre-allocated NumPy output buffer.
import numpy as np
import byte2bitZarr as b2b
x = (np.random.default_rng(0).standard_normal(100000).astype(np.float32) * 5.0)
codec = b2b.Byte2Bit(dtype=np.float32) # defaults: level 4, lossless
encoded = codec.encode(x)
out = np.empty_like(x)
codec.decode(encoded, out=out)
print('encoded bytes:', len(encoded))
print('equal:', np.array_equal(x, out))
Use single-array .b2b files for transfer or persistence, and archive helpers for multiple named arrays.
import numpy as np
import byte2bitZarr as b2b
x = np.random.default_rng(0).standard_normal(100000).astype(np.float32)
b2b.save_byte2bit("Data/numpy/example_array.b2b", x,
compression_level=4, quantize_scale=0)
out = np.empty_like(x)
b2b.load_byte2bit_into("Data/numpy/example_array.b2b", out)
print("equal:", np.array_equal(x, out))
info = b2b.inspect_byte2bit("Data/numpy/example_array.b2b")
print(info)
Write through Zarr with the Byte2Bit serializer to create chunked, queryable compressed stores.
import numpy as np
import zarr
import byte2bitZarr as b2b
x = np.random.default_rng(0).standard_normal(100000).astype(np.float32)
store = zarr.storage.LocalStore("example.byte2bit.b2b")
root = zarr.group(store=store, overwrite=True, zarr_format=3)
arr = root.create_array(
"x", shape=x.shape, chunks=(10000,), dtype=x.dtype,
serializer=b2b.Byte2BitZarrV3Codec(compression_level=4, quantize_scale=0),
compressors=[],
)
arr[:] = x
print("written")
import byte2bitZarr as b2b
b2b.transform("Data/a.zarr", "Data/a.byte2bit.b2b")
b2b.transform("Data/nwp/*.grib", "Data/nwp.byte2bit.b2b")
b2b.transform("Data/dataNetcdf/*.nc", "Data/netcdf.byte2bit.b2b")
b2b.transform("Data/hdf5Data/092535.hdf5", "Data/092535.byte2bit.b2b")
b2b.transform("Data/raster/sample.tif", "Data/raster/sample.byte2bit.b2b",
cogLayout=True)Output suffixes are normalized to .b2b. For GeoTIFF/COG, cogLayout=True uses the COG internal block layout when available.
Use for weather and forecast files. Atlas provides cloud-native output, smaller-than-original GRIB storage, and faster decompression.
b2b.transform("*.grib", "out.byte2bit.b2b")
Use for existing chunked datasets. With Atlas you get storage and egress reduction workflow with faster decompression.
b2b.transform("in.zarr", "out.byte2bit.b2b")
Verification of the original and Atlas compressed data.
import byte2bitZarr as b2b
b2b.verify("Data/us_20260217T0100Z.zarr", "Data/us_20260217T0100Z.byte2bit.b2b")
b2b.verify("Data/nwp/*.grib", "Data/nwp.byte2bit.b2b")
b2b.verify("Data/dataNetcdf/*.nc", "Data/dataNetcdf.byte2bit.b2b")
b2b.verify("Data/hdf5Data/092535.hdf5", "Data/092535.byte2bit.b2b")
b2b.verify_equal("Data/us_20260217T0100Z.zarr", "Data/us_20260217T0100Z.byte2bit.b2b",
arrays=["x", "y"], progress_every=0)
This query selects the latest forecast satisfying the threshold rule, and can optionally attach point values by latitude/longitude.
python python/examples/query_latest_forecast.py --db Data/gfs_metadata/forecast_index.sqlite --short-name 2t --type-of-level heightAboveGround --start 2026-03-08T00:00:00Z --end 2026-03-09T00:00:00Z --threshold 6h --lat 52.52 --lon 13.40 --with-valuesFor each maturity M, select the latest forecast F where F ≤ M - threshold.
Packed GRIB output stores fields under messages/msg_XXXXXX. Use array attributes to find meteorological variables before decoding values.
shortName, name, paramId, typeOfLevel, level, dataDate, dataTime, stepRange
After selecting a message key, use b2b.decompress_gridsimple_message(store, msg_key) to decode a field to NumPy.
Atlas can store model tensors and metadata in a .b2b.ckpt container for compact transfer and restore.
import numpy as np
import byte2bitZarr as b2b
tensors = {
"model_weight": np.arange(12, dtype=np.float32).reshape(3, 4),
"model_bias": np.linspace(-0.2, 0.2, 3, dtype=np.float32),
}
metadata = {"global_step": 120, "epoch": 3, "learning_rate": 1e-3}
b2b.save_ml_checkpoint("Data/checkpoints/model.b2b.ckpt", tensors,
metadata=metadata, compression_level=4,
quantize_scale=0, atomic=True, checksum=False)
loaded = b2b.load_ml_checkpoint("Data/checkpoints/model.b2b.ckpt")
print(loaded["metadata"]["global_step"])
Use cogLayout=True to keep COG internal block layout where available; use False for a simple TIFF/no chunk-layout path.
GDAL Byte2Bit plugin integrated with QGIS and tested using Sentinel imagery.
Byte2Bitâ„¢ Atlas gives teams a practical path from large scientific archives to smaller, verified, query-friendly stores.