Data Types
Published:
Category: { Machine Learning }
Tags:
References:
- Level of Measurement
Summary:
Pages: 2
Data File Formats
Published:
Category: { data science }
Tags:
References:
- Parquet Logical Type Definitions
- Working with Complex Data Formats with Structured Streaming in Apache Spark 2.1 Part 2 of Scalable Data @ Databricks
Summary: Data storage is diverse. For data on smaller scales, we are mostly dealing with some data files.
work_with_data_files
Efficiencies and Compressions Parquet Parquet is fast. But
Don’t use json or list of json as columns. Convert them to strings or binary objects if it is really needed.
Pages: 2