Data Storage

#Data Warehouse

tl;dr: Use type safe formats such as HDF5 or parquet

  1. HDF5
  2. BCOLZ <http://bcolz.blosc.org/en/latest/>_ : not designed for multidimentional data.
  3. Zarr <https://github.com/alimanfoo/zarr>_ : works with multidimensional data and also parallel computating.
  4. Blaze ecosystem <http://blaze.pydata.org/>_

A article that compares HDF5, BCOLZ, and Zarr: To HDF5 and beyond

I also recommend pandas. It is a python module that works very well with data. It even loads HDF5 out of box.

Published: by ;

Table of Contents

Current Ref:

  • wiki/data-warehouse/data-storage.md