Data Storage
tl;dr
: Use type safe formats such as HDF5 or parquet
- HDF5
BCOLZ <http://bcolz.blosc.org/en/latest/>
_ : not designed for multidimentional data.Zarr <https://github.com/alimanfoo/zarr>
_ : works with multidimensional data and also parallel computating.Blaze ecosystem <http://blaze.pydata.org/>
_
A article that compares HDF5, BCOLZ, and Zarr: To HDF5 and beyond
I also recommend pandas. It is a python module that works very well with data. It even loads HDF5 out of box.
Planted:
by L Ma;
No backlinks identified. Reference this note using the Note ID
wiki/data-warehouse/data-storage.md
in other notes to connect them.
L Ma (2018). 'Data Storage', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/wiki/data-warehouse/data-storage/.