The Art of Data Science
A nice and elegant book on data science
Some Key Ideas
The Epicycle
- Question and question refining
- EDA
- Modeling
- Interpretation
- Communication
The example of asthma in US is a nice, easy and clear example about the integration of these activities.
Types of Questions
Leek, J. T., & Peng, R. D. (2015). What is the question? Science, 347(6228), 1314–1315.
- Descriptive
- Exploratory
- Inferential
- Predictive
- Causal
- Mechanistic
A Good Question
- of interest to you audience
- not answered in literature
- plausible in your knowledge framework; it should be finding correlations that can already be identified as correlated using the domain knowledge.
- answerable: the question should be answerable with current technology or dataset or theory.
- specificity: quantify measures, population, sampling, as much as possible
Bias
- recall bias: about the sample response
- selection bias: about sampling
When you are asked to do something
- communicate with others to make sure that you can agree on a question to be answered
- make sure the question is a good question
- determine what type of question it is
MISC
Some random thoughts.
We need a knowledge database for the company
Going through the data analysis process, I found that it is often important to make connections to the current knowledge. For example, it is the key step to make sure the question is not answered.
For academic research, it is usually done through looking up in the literature. When the objective or question is related to some internal data and internal product, it is generally not possible to look up in some public database.
Then we need a data analysis question/objective database. While developing the business, we could accumulate a lot of analysis/questions. If some questions are correlated to other questions, it is generally a good idea to make a connection.
L Ma (2019). 'The Art of Data Science', Datumorphism, 04 April. Available at: https://datumorphism.leima.is/reading/art-of-data-science/.