Data storage is diverse. For data on smaller scales, we are mostly dealing with some data files.
Efficiencies and Compressions
Parquet is fast. But
- Don’t use json or list of json as columns. Convert them to strings or binary objects if it is really needed.
LM (2021). 'Data File Formats', Datumorphism, 02 April. Available at: https://datumorphism.leima.is/cards/machine-learning/datatypes/data-file-formats/.