Data Engineering for Data Scientists: Checklist

It is always good for a data scientist to understand more about data engineering, especially the blueprint of a fully productionized data platform.

There are several things to get into:

  • Connection to Data Sources
    • Connect to DB
    • Connect to Streaming Data
      • Message Queues
    • Connect to Website
    • Other Data Services
  • [[Data Storage]] Data Storage Storing big data
    • Data Lake
    • [[Data Warehouse]] Data Warehouse Take care of your data and your data will show you its power.
    • Message Queues
  • [[Data Processing]] Data Processing Processing Data is essential.
    • Streaming
    • Batch Processing
  • Data Buffer
    • Cache:
    • Message Queues
      • Kafka
      • AWS Kinesis
  • Using Data
    • Query Data
    • Visualization
    • Analysis and Model Building
  • [[Scale Up]] Scale Up scale up your services
    • Scale up storage
    • Scale up node
    • Parallel

Planted: by ;

L Ma (2021). 'Data Engineering for Data Scientists: Checklist', Datumorphism, 05 April. Available at: https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/checklist/.