Data Engineering for Data Scientists: Checklist
It is always good for a data scientist to understand more about data engineering, especially the blueprint of a fully productionized data platform.
There are several things to get into:
- Connection to Data Sources
- Connect to DB
- Connect to Streaming Data
- Message Queues
- Connect to Website
- Scraping
- [[Node Crawler]] Node Crawler Write a crawler using nodejs
- API
- Scraping
- Other Data Services
- [[Data Storage]]
Data Storage
Storing big data
- Data Lake
- [[Data Warehouse]] Data Warehouse Take care of your data and your data will show you its power.
- Message Queues
- [[Data Processing]]
Data Processing
Processing Data is essential.
- Streaming
- Batch Processing
- Data Buffer
- Cache:
- Redis
- [[Basics of Redis]] Basics of Redis Redis is an in-memory nosql data structure server
- Redis
- Message Queues
- Kafka
- AWS Kinesis
- Cache:
- Using Data
- Query Data
- Visualization
- Analysis and Model Building
- [[Scale Up]]
Scale Up
scale up your services
- Scale up storage
- Scale up node
- Parallel
Planted:
by L Ma;
References:
Dynamic Backlinks to
wiki/data-engeering-for-data-scientist/checklist
:wiki/data-engeering-for-data-scientist/checklist
Links to: Additional Double Backet Links:
L Ma (2021). 'Data Engineering for Data Scientists: Checklist', Datumorphism, 05 April. Available at: https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/checklist/.