Datumorphismhttps://datumorphism.leima.is/Recent content on DatumorphismHugo -- gohugo.ioen-USFri, 10 Feb 2023 00:00:00 +0000Neural ODEhttps://datumorphism.leima.is/wiki/machine-learning/neural-ode/neural-ode-basics/Sat, 13 Aug 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/neural-ode/neural-ode-basics/In [[neural networks]] Artificial Neural Networks Simple artificial neural networks using multilayer perceptron , residual connections is a popular architecture to build very deep neural networks1. Apart from residual networks, there are many other designs for deep neural networks23456. These methods share similar ideas that the layered structure in deep neural networks can be treated as a dynamical system and these different architectures are different numerical approaches of solving the dynamical system.
Figure taken from He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv [cs.CV]. 2015. Available: http://arxiv.org/abs/1512.03385
The degradation problem For deep neural networks, not all deep architectures are able to produce results that are significantly better than shallow architectures.The Time Series Forecasting Problemhttps://datumorphism.leima.is/wiki/forecasting/forecasting-problem/Sun, 24 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/forecasting/forecasting-problem/There are many different types of tasks on time series data:
classification, anomaly detection, forecasting. Forecasting Problem A time series forecasting problem can be formulated as the following.
Given a dataset $\mathcal D$, with
$y^{(i)}_t$, the sequential variable to be forecasted, $x^{(i)}_t$, exogenous data for the time series data, $u^{(i)}_t$, some features that can be obtained or planned in advance, where ${}^{(i)}$ indicates the $i$th variable, ${}_ t$ denotes time. In a forecasting task, we use $y^{(i)} _ {t-K:t}$, $x^{(i) _ {t-K:t}}$, and $u^{(i)} _ {t-K:t+H}$, to forecast the future $y^{(i)} _ {t+1:t+H}$.
A model $f$ will use $x^{(i)} _ {t-K:t}$ and $u^{(i)} _ {t-K:t+H}$ to forecast $y^{(i)} _ {t+1:t+H}$.Prediction Space in Forecastinghttps://datumorphism.leima.is/cards/forecasting/prediction-space/Fri, 08 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/cards/forecasting/prediction-space/In a forecasting problem, we have
$\mathcal P$, the priors, e.g., price and demand is negatively correlated, $\mathcal D$, available dataset, $Y$, the observations, and $F$, the forecasts. Information Set $\mathcal A$
The priors $\mathcal D$ and the available data $\mathcal P$ can be summarized together as the information set $\mathcal A$. Under a probabilistic view, a forecaster will find out or approximate a CDF $\mathcal F$ such that1
$$ \mathcal F(Y\vert \mathcal D, \mathcal P) \to F. $$
Naively speaking, once the density $\rho(F, Y)$ is determined or estimated, a probabilistic forecaster can be formed.Feasibility of Learninghttps://datumorphism.leima.is/wiki/learning-theory/feasibility-of-learning/Sun, 17 Oct 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/learning-theory/feasibility-of-learning/Why is learning from data even possible? To discuss this problem, we need a framework for learning. Operationally, we can think of learning as the following framework1.
Abu-Mostafa2012
Naive View Naively speaking, a model should have two key properties,
enough capacity to hold the necessary information embedded in the data, and a method to find the combination of parameters so that the model can generate/complete new data. Most neural networks have enough capacity to hold the necessary information in the data2. The problem is, the capacity is so large. Why does backprop even work? How did backprop find a suitable set of parameters that can generalize?What is Graphhttps://datumorphism.leima.is/wiki/graph/basics/what-is-graph/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/graph/basics/what-is-graph/Graph A graph $\mathcal G$ has nodes $\mathcal V$ and edges $\mathcal E$,
$$ \mathcal G = ( \mathcal V, \mathcal E). $$
Edges
Edges are relations between nodes. For $u\in \mathcal V$ and $v\in \mathcal V$, if there is an edge between them, then $(u, v)\in \mathcal E$. Representations of Graph There are different representations of a graph.
Adjacency Matrix A adjacency matrix of a graph represents the nodes using row and column indices and edges using elements of the matrix.
For simple graph, the adjacency matrix is rank two and dimension $\lvert \mathcal V \rvert \times \lvert \mathcal V \rvert$.An Introduction to Generative Modelshttps://datumorphism.leima.is/wiki/machine-learning/generative-models/generative/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/generative-models/generative/Discriminative model:
The conditional probability of class label on data (posterior) $p(C_k\mid x)$ Generative models:
Likelihood $p(x\mid C_k)$ Sample from the likelihood to generate data With latent variables $z$ and some neural network parameters $\theta$: $P(x,z\mid \theta) = p(x\mid z, \theta)p(z)$Contrastive Modelhttps://datumorphism.leima.is/wiki/machine-learning/contrastive-models/contrastive/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/contrastive-models/contrastive/Contrastive models learn to compare1. Contrastive use special objective functions such as [[NCE]] Noise Contrastive Estimation: NCE Noise contrastive estimation (NCE) objective function is1 $$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ C(x, x^{+})}{ C(x,x^{+}) + C(x,x^{-}) } \right], $$ where $x^{+}$ represents data similar to $x$, $x^{-}$ represents data dissimilar to $x$, $C(\cdot, \cdot)$ is a function to compute the similarities. For example, we can use $$ C(x, x^{+}) = e^{ f(x)^T f(x^{+}) }, $$ so that the objective function becomes $$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ e^{ … and [[Mutual Information]] Mutual Information Mutual information is defined as $$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}.GANhttps://datumorphism.leima.is/wiki/machine-learning/adversarial-models/gan/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/adversarial-models/gan/The task of GAN is to generate features $X$ from some noise $\xi$ and class labels $Y$,
$$\xi, Y \to X.$$
Many different GANs are proposed. Vanilla GAN has a simple structure with a single discriminator and a single generator. It uses the minmax game setup. However, it is not stable to use minmax game to train a GAN model. WassersteinGAN was proposed to solve the stability problem during training1. More advanced GANs like BiGAN and ALI have more complex structures.
Vanilla GAN Minmax Game Suppose we have two players $G$ and $D$, and a utility $v(D, G)$, a minmax game is maximizing the utility $v(D, G)$ for the worst case of $G=\hat G$ that minimizes $v$ then we have to find $D=\hat D$ that maximizes $v$, i.MaxEnt Modelhttps://datumorphism.leima.is/wiki/machine-learning/energy-based-model/maxent-energy-based-model/Mon, 31 May 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/energy-based-model/maxent-energy-based-model/The Maximum Entropy model, aka MaxEnt model, is a fascinating generative model as it is based on a very intuitive idea from statistical physics - the Principle of Maximum Entropy.
The Idea The essence of the MaxEnt model is that the underlying probability distribution $p(x)$ of the random variables $x$ should
gives the whole system the largest uncertainty, while producing reasonable observables. Uncertainty The uncertainty of the whole system is described by the Shannon entropy based on the probability distributions $p(x)$,
$$ S[p] = -\operatorname{Tr} p(x) \log p(x). $$
The [[Shannon entropy]] Coding Theory Concepts The code function produces code words.Data Engineering for Data Scientists: Checklisthttps://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/checklist/Wed, 05 May 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/checklist/It is always good for a data scientist to understand more about data engineering, especially the blueprint of a fully productionized data platform.
There are several things to get into:
Connection to Data Sources Connect to DB Connect to Streaming Data Message Queues Connect to Website Scraping [[Node Crawler]] Node Crawler Write a crawler using nodejs API Other Data Services [[Data Storage]] Data Storage Storing big data Data Lake [[Data Warehouse]] Data Warehouse Take care of your data and your data will show you its power.Gibbs Samplinghttps://datumorphism.leima.is/wiki/monte-carlo/gibbs-sampling/Fri, 01 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/monte-carlo/gibbs-sampling/Principles of Designhttps://datumorphism.leima.is/wiki/data-visualization/design/Fri, 20 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/data-visualization/design/There are many principles of designing a visual representation of data. However, before we understand how data is represented visually, it would benefit us a lot if we understand the basic principles of designing on 2D surface.
Robin’s CRAP Robin Williams proposed the four elements of design:
Contrast Repetition Alignment Proximity Contrast Use some contrast to distinguish the elements of different contents.
Repetition Repeat the design of similar elements on the same page and across pages to make sure the readers learn the meaning of the design quickly.
Alignment Find a strong line and stick to it.Model Selectionhttps://datumorphism.leima.is/wiki/model-selection/model-selection/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/model-selection/model-selection/Suppose we have a generating process that generates some numbers based on a distribution. Based on a data sample, we could reconstruct some sort of theoretical models to represent the actual generating process.
Which is a Good Model? (1)The black curve represent the generating process. The red rectangle is a very simple model that captures some major samples. The blue step-wise model is capturing more sample data but with more parameters.
In the above example, the red model on the left is not that good in most cases while the blue model seems to be better. In reality, the choice depends on the usage of the model.Receiver Operating Characteristics: ROChttps://datumorphism.leima.is/wiki/machine-learning/performance/roc/Wed, 13 May 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/performance/roc/ROC space is the two-dimensional space spanned by True Positive Rate and False Positive Rate.
ROC Space. The color boxes are indicating the confusion matrices. Green is the fraction of true positive. Orange is the fraction of false positive. Refer to Confusion Matrix for more details.
AUC: Area under Curve TPR = TP Rate FPR = FP Rate The ROC curve is defined by the relation $f(TPR, FPR)$. Area under the ROC curve is
$$ \int TPR(FPR) d(FPR) \sim \sum_i TPR_i *\Delta FPR. $$
If AUC = 1, we have TP Rate = 1 for all FP Rate.Tree-based Learninghttps://datumorphism.leima.is/wiki/machine-learning/tree-based/overview/Wed, 25 Dec 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/tree-based/overview/Decision tree is an easy-to-interpret method in supervised learning. Though simple, it is being used in some widely used algorithms such as random forest method.Embeddinghttps://datumorphism.leima.is/wiki/machine-learning/embedding/overview/Sun, 13 Oct 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/embedding/overview/Factorizationhttps://datumorphism.leima.is/wiki/machine-learning/factorization/overview/Mon, 17 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/factorization/overview/Feature Engineeringhttps://datumorphism.leima.is/wiki/machine-learning/feature-engineering/overview/Mon, 17 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/feature-engineering/overview/Naive Bayeshttps://datumorphism.leima.is/wiki/machine-learning/bayesian/naive-bayes/Mon, 17 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/bayesian/naive-bayes/Naive Bayesian is a classifier using [[Bayes' Theorem]] Bayes' Theorem Bayes' Theorem is stated as $$ P(A\mid B) = \frac{P(B \mid A) P(A)}{P(B)} $$ $P(A\mid B)$: likelihood of A given B $P(A)$: marginal probability of A There is a nice tree diagram for the Bayes' theorem on Wikipedia. Tree diagram of Bayes' theorem with ‘naive’ assumptions.
Problems with Conditional Probability Calculation By definition, the conditional probability of event $\mathbf Y$ given features $\mathbf X$ is $$ \begin{equation} P(\mathbf Y\mid \mathbf X) = \frac{P(\mathbf Y, \mathbf X)}{ P(\mathbf X) }, \label{def-cp-y-given-x} \end{equation} $$
where
$P(\mathbf X)$ is probability of an event having the features $\mathbf X$, $P(\mathbf Y, \mathbf X)$ is the probability of the event $Y$ with features $\mathbf X$.Confusion Matrix (Contingency Table)https://datumorphism.leima.is/wiki/machine-learning/basics/confusion-matrix/Fri, 31 May 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/basics/confusion-matrix/Confusion Matrix It is much easier to understand the confusion matrix if we use a binary classification problem as an example. For example, we have a bunch of cat photos and the user labeled “cute or not” data. Now we are using the labeled data to train a cute-or-not binary classifier.
Then we apply the classifier on the test dataset and we would only find four different kinds of results.
Labeled as Cute Labeled as Not Cute Classifier Predicted to be Cute True Positive (TP) False Positive (FP) Classifier Predicted to be Not Cute False Negative (FN) True Negative (TN) This table is easy enough to comprehend.Normal Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/normal-distribution/Tue, 22 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/normal-distribution/Visualization Math The formula of normal distribution is
$$ \begin{equation} e^{ ( (x - \mu) / \sqrt{2} \sigma )^2 } \end{equation} $$
where $\mu$ controls the “center” or “peak” of the distribution and $\sigma$ tells us how “wide” or “disperse” the distribution is.
To understand the distribution, we take some limits.
$x = \mu$ First of all, when $x = \mu$ we have
$$ e^0 = 1. $$
Notice the argument of the exponential is some squared value and can not be negative. This condition gives us the peak value.
$x=\mu-a$ and $x=\mu + a$ For $x=\mu-a$, we haveStatistical Hypothesis Testinghttps://datumorphism.leima.is/wiki/statistical-hypothesis-testing/hypothesis-testing/Sun, 20 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/statistical-hypothesis-testing/hypothesis-testing/When we have a sample of the population, we immediately calculate the mean using the sample, say the result is $\mu_0$. Of course, the population mean $\mu_p$ is unknown and probably can never be known.
This specific sample mean $\mu_0$ is nothing but like an advanced educated guess. Then again, how do we know if our this specific sample mean $\mu_0$ is a faithful representation of the population mean? In fact, this question is not limited to mean. It applies to any statistical measurement.
The Statistical Estimation Theory Solution to This Problem
In statistical estimation theory, we would tell the sample value with some indication of the degrees of faithfulness.Why Estimation Theoryhttps://datumorphism.leima.is/wiki/statistical-estimation/why-estimation-theory/Sun, 20 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/statistical-estimation/why-estimation-theory/In statistics, we work with samples. For example, the sample mean is easily calculated. However, it is the population mean that is more valuable.
Suppose we have one sample $S_i$, which is used to calculate the mean of the sample $\mu_i$. We have two key problems to solve at this moment.
Can we use this sample mean $\mu_i$ to represent the population mean $\mu_p$? How good is our estimations? To answer these questions, we need to work out the properties of the samples themselves and work out a theory to instruct us to infer population statistics from sample statistics.What is Statisticshttps://datumorphism.leima.is/wiki/statistics/what-is-statistics/Fri, 18 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/statistics/what-is-statistics/A Case Study We have a problem.
In our lab, we found a huge amount of similar robots on a planet (physical population). To know more about the weight of these robots (statistical population), we first need to choose some of them (physical sample), then obtain the weight of them (statistical sample).
To describe the data, we could calculate the mean of the weight. We found that the mean weight is 93kg (descriptive statistics).
We could simply give a number to standard for the mean weight of all the robots. (point estimate) We could tell a number as the mean weight of all the robots together with a range that tells us how disperse our measurement is.Association Ruleshttps://datumorphism.leima.is/wiki/pattern-mining/association-rules/Sun, 06 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/pattern-mining/association-rules/Association rule is a method for pattern mining. In this article, we perform an association rule analysis of some demo data.
The Problem Defined Suppose we own a store called KIOSK. Here at KIOSK, we sell 4 different things.
Milk Croissant Coffee Fries We need to know what items are associated with each other when the customers are buying.
We have collected the following data. Beware that this small amount of data might not be enough for a real-world problem.
INDEX Items 1 croissant, milk 2 coffee, croissant 3 coffee, croissant 4 coffee, croissant, milk 5 coffee, milk 6 fries, milk 7 coffee, croissant, fries 8 croissant, fries 9 croissant, milk 10 croissant, fries, milk 11 coffee, croissant, milk The Rule, Support, and Confidence The Association rule has three components: the rule, the support, and the confidence.Some Concepts about Data Warehousehttps://datumorphism.leima.is/wiki/data-warehouse/data-warehouse-concepts/Fri, 23 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/data-warehouse/data-warehouse-concepts/The Three Key Ideas about Warehouse The purpose of the data warehouse should be clear. In most cases, it is for the analysis of data, not for data production.1
Subject-oriented: since data warehouses are for decision-makers, arrange them into subjects makes it much easier to access. Integrated: many sources are integrated for easy analysis Time-variant: observation time should be recorded since the data is also used to analyze the time evolution Nonvolatile: simply for analysis OLTP and OLAP OLTP: online transaction processing OLAP: online analytical processing OLTP OLAP user customer data scientist, managers purpose production analysis content everything cleaner data database entity relation model, application-oriented star/snowflake model, subject-oriented history usually no need to record the history history is crucial query short and frequent read and write read-only and but complicated analysis Scope of Data Warehouse Enterprise warehouse: targeting the whole organization Data mart: for a specific group of people Virtual warehouse: views not tables Fact and Dimension Fact is the value of something specified by the dimension.Artificial Neural Networkshttps://datumorphism.leima.is/wiki/machine-learning/neural-networks/artificial-neural-networks/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/neural-networks/artificial-neural-networks/Artificial neural networks works pretty well for solving some differential equations.
Universal Approximators Maxwell Stinchcombe and Halber White proved that no theoretical constraints for the feedforward networks to approximate any measurable function. In principle, one can use feedforward networks to approximate measurable functions to any accuracy.
However, the convergence slows down if we have a lot of hidden units. There is a balance between accuracy and convergence rate. More hidden units lead to slow convergence but more accuracy.
Here is a quick review of the history of this topic.
Kolmogorov’s Theorem
Kolmogorov’s theorem shows that one can use a finite number of carefully chosen continuous functions to mix up by sums and multiplication with weights to a continuous multivariable function on a compact set.Ordinary Differential Equationshttps://datumorphism.leima.is/wiki/dynamical-system/ordinary-differential-method/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/dynamical-system/ordinary-differential-method/For a first order differentiation $\frac{\partial f}{\partial t}$, we might have many finite differencing methods.
Euler Method For linear first ODE,
$$ \frac{dy}{dx} = f(x, y), $$
we can discretize the equation using a step size $\delta x \cdot$ so that the differential equation becomes
$$ \frac{y_{n+1} - y_n }{ \delta x } = f(x_n, y_n), $$
which is also written as
$$ y_{n+1} = y_n + \delta x \cdot f(x_n, y_n). \label{euler-method-discretized-form-y-n-plus-1} $$
This is also called forward Euler differencing. It is first order accurate in $\Delta t$.
Generally speaking, a simple iteraction will do the work.
Adams' Method Taylor Expansion of FunctionsBasics of Computationhttps://datumorphism.leima.is/wiki/computation/basics-of-computation/Thu, 13 Sep 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/computation/basics-of-computation/Storage, Precision, Error, etc To have some understanding of how the numbers are processed in computers, we have to understand how the numbers are stored first.
Computers stores everything in binary form 1. Suppose we randomly get some segments in the memory, we have no idea what that stands for since we do not know the type of data it represents.
Some of the most used data types in data science are
integer, float, string, time. Integers Integers can occupy $2^0$, $2^1$, $2^2$, $2^3$ bytes in memory.
1 byte : | | | | | | | | | 2 bytes: | | | | | | | | | | | | | | | | | 4 bytes: | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .Introduction to Node Crawler Serieshttps://datumorphism.leima.is/wiki/nodecrawler/node-crawler-introduction/Sun, 15 Jul 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/nodecrawler/node-crawler-introduction/This is a set of tutorials that will help you with your very first crawler with node.js.
The plan of this tutorial is as follows. First of all, we will write a functional crawler using node.js and dump the data into files or simply print it on screen. In the following article, we will use MongoDB as our data management system and organize our data. Then we will optimize and attack some of the pitfalls.
As mentioned, Node.js and MongoDB are required for the full tutorial. Here we link to the articles that talks about the installation and configuration of them.Jupyter Notebookhttps://datumorphism.leima.is/wiki/tools/jupyter/Wed, 20 Jun 2018 15:58:49 -0400https://datumorphism.leima.is/wiki/tools/jupyter/Magics %lsmagic will show all the magics, including line magics and cell magics.
Line magics are magics start with one %; Cell magics are magics that can be used in the whole cell even with line breaks, where the cell should start with %%. %env can be used when setting environment variables inside the notebook.
%env MONGO_URI=localhost:27072 %%bash is a cell magic that allows bash commands in the cell.
%%bash ls pip install datahubxyz %%time enables timing of functions.
%%time for i in range(1000): i*i %timeit is the corresponding line magic which times the function of the corresponding line.Regular Expression Basicshttps://datumorphism.leima.is/wiki/sugar/regular-experssions/Wed, 20 Jun 2018 15:58:49 -0400https://datumorphism.leima.is/wiki/sugar/regular-experssions/List of Keys Anchors at the beginning of line ^ import re p = re.compile('^T', re.I) line = "The email address is this this the do you see" result = p.findall(line) print(result) # ['T'] at the end of the line $ import re p = re.compile('e$', re.I) line = "The email address is this this the do you see" result = p.findall(line) print(result) # ['e'] Character Classes Printable Characters any character . import re p = re.compile('^T.', re.I) line = "The email address is this this the do you see" result = p.findall(line) print(result) # ['Th'] single character of digit \d word character \w (including alphanumeric character and underscore): import re line = "The email address is this this the do you see" result = re.Short-Time-Fourier-Transformhttps://datumorphism.leima.is/wiki/time-series/short-time-fourier-transform/Wed, 20 Jun 2018 15:58:49 -0400https://datumorphism.leima.is/wiki/time-series/short-time-fourier-transform/Short-Time-Fourier-Transform We Fourier transform the time series data using a Fourier transform, with some window function
\begin{equation} \tilde Y[n,k] = \sum_m Y[n+m] W[m] e^{-i \lambda_k m}, \end{equation}
where $\lambda_k=2\pi k/N$ and $W[m]$ is the window function at $m$.
References and Notes CouseraLinear Methodshttps://datumorphism.leima.is/wiki/machine-learning/linear/linear-methods/Fri, 25 May 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/linear/linear-methods/Solving Classification Problems with Linear Models One simple idea behind classification is to calculate the posterior probability of each class given the variables.
Suppose a dataset have features $F_\alpha$ where $\alpha = 1, 2, \cdots, K$, with corresponding class labels $G_\alpha$. The dataset that provides $N$ datapoints with each deoted as $X_i$. The posterior of the classification is $P(G = G_\alpha \vert X = X_i)$.
A naive idea is to classify the data into two classes $m$ and $n$ using the boundary of a linear model
$$ P(G = G_\alpha \vert X = X_i) = P(G = G_\beta \vert X = X_i).Machine Learning Overviewhttps://datumorphism.leima.is/wiki/machine-learning/overview/Fri, 25 May 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/overview/What is Machine Learning Abu-Mostafa, Magdon-Ismail, and Lin summarized machine learning problem using the following chart 1 2. Utilimately, we need to find an approximation $g$ of the true map $f$ from features $\mathcal X$ to targets $\mathcal Y$ on a specific probability distribution of features $P$. This process is done by using an algorithm to select some hypothesis that works.
From the book Learning From Data by Abu-Mostafa, Magdon-Ismail, and Lin. I am using a version by Deckert.
In the core of machine learning models, we have three components3:
Representation: encode data and problem representation, i.Unsupervised Learninghttps://datumorphism.leima.is/wiki/machine-learning/unsupervised/overview/Fri, 25 May 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/unsupervised/overview/Unsupervised Learning!
Principle components analysis Clustering K-means Clustering Algorithm:
Assign data points to a group Iterate through until no change: Find centroid Find the point that is closest to the centroids. Assign that data point to the corresponding group of the centroids. How Many Groups
The art of chosing K. Hierarchical Clustering Bottom-up hierarchical groups can be read out from the dendrogram.Some Basic Ideas of Algorithmshttps://datumorphism.leima.is/wiki/algorithms/algorithms-basics/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/algorithms/algorithms-basics/This set of notes on algorithms is not meant to be comprehensive or complete. These notes are being used as a skeleton framework. There are many useful books to learn about algorithms from a utilitarian point of view. I have listed a few in the references section.
Numerical recipes1 is a very comprehensive book that I used during my PhD. It covers almost all the algorithms you need for scientific computing.
Grokking Algorithms2 is another good book to learn the basics of algorithms. It is barely entry level but is fun to read.
An Outline Data Structure mind the data structure Basics of MapReduce mapreduce numrec Press, W.The C++ Languagehttps://datumorphism.leima.is/wiki/programming-languages/cpp/references/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/programming-languages/cpp/references/C++!
Books The C++ Programming Language Programming Principles and Practice Using C++ The C++ Primer Lectures C++ Beginners Tutorial 1 (For Absolute Beginners) C++ Programming Introduction to C++ Coursear Course: C++ For C Programmers, Part A Top C++ Courses and Tutorials On SoloLearn: C++ Tutorial Practice SoloLearn provides this code playground that we can use to test c++ codes. There is also repl.it
Libraries For solving differential equations:
http://headmyshoulder.github.io/odeint-v2/ http://www.mcs.anl.gov/petsc/ https://github.com/trilinos/Trilinos http://homepage.math.uiowa.edu/~dstewart/meschach/meschach.html http://www.boost.org/ http://www.feynarts.de/cuba/ https://www.gnu.org/software/gsl/ Linear algebra
http://www.simunova.com/mtl4 Comparison of libs
Boost is faster than GSL in terms of random numbers and odeint.The Python Language: Basicshttps://datumorphism.leima.is/wiki/programming-languages/python/basics/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/programming-languages/python/basics/Numbers, Arithmetics Two types of numbers exist,
int float, 15 digits, other digits are float error It is worth noting that in Python 2, we have
print(1.0/3) # will give us float numbers # 0.333333333333 while
print(1/3) # will only give us int # 0 However, this was changed in Python 3.
Variables, Functions, Conditions A variable name should start with either a letter or an underscore.
Variables defined inside a function is local and there is no way to find it or use it outside the function. It is even possible to reuse an already used global variable inside a function.Curriculumhttps://datumorphism.leima.is/awesome/curriculum/Mon, 01 Jan 0001 00:00:00 +0000https://datumorphism.leima.is/awesome/curriculum/Prerequisites Programming Bash: [[all posts with the bash tag]] Bash Python: [[The Python Language]] The Python Language Python as a programming language [[all posts with the python tag]] Python C++: [[C/C++]] C/C++ [[all posts with the C++ tag]] C++ alternatives:
R Matlab Python Some essential libraries:
Data numpy [[Articles with numpy tag]] numpy scipy [[Articles with scipy tag]] scipy pandas [[Articles with pandas tag]] pandas dask [[PySpark]] Data Processing - (Py)Spark Processing Data using (Py)Spark Visualization matplotlib [[Articles with matplotlib tag]] Matplotlib seaborn plotly and your machine learning libraries Use virtual environments:Time Series Forecasting with Deep Learninghttps://datumorphism.leima.is/wiki/forecasting/forecasting-with-deep-learning/Sun, 24 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/forecasting/forecasting-with-deep-learning/The Encoder-Decoder Framework Many of the models for [[time series forecasting]] The Time Series Forecasting Problem Forecasting time series using deep learning are following some sort of encoder-decoder architecture.
Encoder: $g_{\text{enc}}(x^{(i)} _ {t-K:t}, u^{(i)} _ {t-K:t}) \to z_t$, Decoder: $g_{\text{dec}}(z_t, u^{(i)} _ {t+1: t+H}) \to y_{t+1:t+H}$.Statistics of Graphshttps://datumorphism.leima.is/wiki/graph/basics/statistics-of-graphs/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/graph/basics/statistics-of-graphs/Local Statistics Node Degree Node Degree Node degree of a node $u$ $$ d_u = \sum_{v\in \mathcal V} A[u,v], $$ where $A$ is the adjacency matrix. Node Centrality Importance of a node on a graph:
Eigenvector Centrality of a Graph Given a graph with adjacency matrix $\mathbf A$, the eigenvector centrality is $$ \mathbf e_u = \frac{1}{\lambda} \sum_{v\in\mathcal V} \mathbf A[u,v] \mathbf e_v, \qquad \forall u \in \mathcal V. $$ Why is it called Eigenvector Centrality The definition is equivalent to $$ \lambda \mathbf e = \mathbf A\mathbf e. $$ Power Iteration The solution to $\mathbf e$ is the eigenvector that corresponds to the largest eigenvalue $\lambda_1$.Contrastive Model: Context-Instancehttps://datumorphism.leima.is/wiki/machine-learning/contrastive-models/context-instance/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/contrastive-models/context-instance/In contrastive methods, we can manipulate the data to create data entries and infer the changes using a model. These methods are models that “predict relative position”1. Common tricks are
shuffling image sections like jigsaw, and rotate the image. We can also adjust the model to discriminate the similarities and differences. For example, to generate contrast, we can also use [[Mutual Information]] Mutual Information Mutual information is defined as $$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}. $$ In the case that $X$ and $Y$ are independent variables, we have $P_{XY} = P_X P_Y$, thus $I(X;Y) = 0$.f-GANhttps://datumorphism.leima.is/wiki/machine-learning/adversarial-models/f-gan/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/adversarial-models/f-gan/The essence of [[GAN]] GAN The task of GAN is to generate features $X$ from some noise $\xi$ and class labels $Y$, $$\xi, Y \to X.$$ Many different GANs are proposed. Vanilla GAN has a simple structure with a single discriminator and a single generator. It uses the minmax game setup. However, it is not stable to use minmax game to train a GAN model. WassersteinGAN was proposed to solve the stability problem during training1. More advanced GANs like BiGAN and ALI have more complex structures. Vanilla GAN Minmax Game … is comparing the generated distribution $p_G$ and the data distribution $p_\text{data}$.Generative Model: Autoregressive Modelhttps://datumorphism.leima.is/wiki/machine-learning/generative-models/autoregressive-model/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/generative-models/autoregressive-model/An autoregressive (AR) model is autoregressive,
$$ \begin{equation} \log p_\theta (x) = \sum_{t=1}^T \log p_\theta ( x_{t} \mid {x_{<t}} ). \end{equation} $$
In the above example, the likelihood is modeled as
$$ \begin{align} p_\theta (x) &= \Pi_{t=1}^T p_\theta (x_t \mid x_{1:t-1}) \\ &= p_\theta(x_2 \mid x_{1:1}) p_\theta(x_3 \mid x_{1:2}) \cdots p_\theta(x_T \mid x_{1:T-1}) \end{align} $$
Taking the log of it
$$ \ln p_\theta (x) = \sum_{t=1}^T \ln p_\theta (x_t \mid x_{1:t-1}) $$
Notations and Conventions
In AR models, we have to mention the preceding nodes (${x_{<t}}$) of a specific node ($x_{t}$). For $t=5$, the relations between ${x_{<5}}$ and $x_5$ is shown in the following illustration.Poisson Regressionhttps://datumorphism.leima.is/wiki/machine-learning/linear/poisson-regression/Fri, 07 May 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/linear/poisson-regression/Poisson regression is a generalized linear model for count data.
To model a dataset that is generated from a [[Poisson distribution]] Poisson Process , we only need to model the mean $\mu$ as it is the only parameters. The simplest model we can have for some given features $X$ is a linear model. However, for count data, the effects of the predictors are often multiplicative. The next simplest model we can have is
$$ \mu = \exp\left(\beta X\right). $$
The $\exp$ makes sure that the mean is positive as this is required for count data.Principles of Colorshttps://datumorphism.leima.is/wiki/data-visualization/colors/Fri, 20 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/data-visualization/colors/ColorTeller
I wrote a python package called colorteller to help us manage and benchmark colors. Basic Concepts of Colors Color Wheel and Color Sphere There are two dimensions in the color wheel:
Hue Saturation When we add another dimension, lightness, to the wheel, we have a color sphere (1, 2).
Many color systems have been invented. Color wheel and color sphere are two examples of them.Goodness-of-fithttps://datumorphism.leima.is/wiki/model-selection/goodness-of-fit/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/model-selection/goodness-of-fit/Does the data agree with the model?
Calculate the distance between data and model predictions. Apply Bayesian methods such as likelihood estimation: likelihood of observing the data if we assume the model; the results will be a set of fitting parameters. … Why don’t we always use goodness-of-fit as a measure of the goodness of a model?
We may experience overfitting. The model may not be intuitive. This is why we would like to balance it with parsimony using some measures of generalizability.
K-means and overfitting
The overfitting problem is easily demonstrated using the K-means model.Data Types and Level of Measurement in Machine Learninghttps://datumorphism.leima.is/wiki/machine-learning/feature-engineering/data-types/Wed, 15 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/feature-engineering/data-types/Types of Data There are several debatable categorization methods of data.
The first widely spread theory, or level of measurement, is by S. Stevens. The theory categorizes data into four types, nominal, ordinal, interval, and ratio.
Other methods are proposed for other fields of research. For example, N. R. Chrisman proposed a different method for cartography. However, these are not generic enough for data science. They are more general than a specific field of research.
For machine learning, many statistical data types have been proposed. Some examples of data types and their relations with the level of measurement are shown in the following chart.Decision Treehttps://datumorphism.leima.is/wiki/machine-learning/tree-based/decision-tree/Wed, 25 Dec 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/tree-based/decision-tree/In this article, we will explain how decision trees work and build a tree by hand.
The code used in this article can be found in this repo. Definition of the problem We will decide whether one should go to work today. In this demo project, we consider the following features.
feature possible values health 0: feeling bad, 1: feeling good weather 0: bad weather, 1: good weather holiday 1: holiday, 0: not holiday For more compact notations, we use the abstract notation $\{0,1\}^3$ to describe a set of three features each with 0 and 1 as possible values.Bayesian Linear Regressionhttps://datumorphism.leima.is/wiki/machine-learning/bayesian/bayesian-linear-regression/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/bayesian/bayesian-linear-regression/Linear Regression and Likelihood The linear estimator $y$ is
$$ \begin{equation} y^n = \beta^m X_m^{\phantom{m}n}. \label{eq-linear-model} \end{equation} $$
As usual, we have redefined our data to get rid of the intercept $\beta^0$.
In ordinary linear models, we find the error being the difference between the target $\hat y$ and the estimator $y$
$$ \epsilon = \hat y - y, $$
which is required to have a minimum absolute value.
In linear regressions, we use least squares to solve the problem. In Bayesian linear regression, instead of using a deterministic estimator $\beta^m X_m^{\phantom{m}n}$, we assume a Gaussian random estimator
$$ \begin{equation} \mathcal{N}(\mu, \sigma^2) = \mathcal{N}(\beta^m X_m^{\phantom{m}n}, \sigma^2), \end{equation} $$NMF: Nonnegative Matrix Factorizatioinhttps://datumorphism.leima.is/wiki/machine-learning/factorization/nmf/Thu, 13 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/factorization/nmf/Decomposition For simplicity, we start with a data point $\mathbf P$ in a $k$-dimensional space spanned by $k$ basis vectors $\mathbf V^k$. Naturally, we could write down the component decomposition of the point using the basis vectors $\mathbf V^k$,
$$ \mathbf P = P_k \mathbf V^k. $$
This is immediately obvious to us since we have been dealing with rank 2 $(k, 1)$ basis vectors and we are talking about the $k$ coordinates for a point.
This point is represented by a matrix of rank 2 $(k, 1)$ given this basis.
$$ \mathbf P \to \begin{pmatrix} P_1, P_2, \cdots, P_k \end{pmatrix} $$Word2vechttps://datumorphism.leima.is/wiki/machine-learning/embedding/word2vec/Thu, 13 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/embedding/word2vec/Word2vec is a word embedding model that learns the probability of some words being neighbours in a sentence $p_{neighbours}(w_i, w_o)$.
Build a dataset of adjacent words. CBOW; skipgram; negative sampling; Encode the words using vectors. Build a model $f(\{\theta_i\})$ to calculate the probability of the words being neighours and improve the parameters $\{\theta_i\}$ using the dataset.Bias-Variancehttps://datumorphism.leima.is/wiki/machine-learning/basics/bias-variance/Fri, 07 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/basics/bias-variance/Bias and Variance Suppose $f(X)$ is a perfect model that represents a “tight” model of the dataset $(X,Y)$ but some irredicible error $\epsilon$,
$$ \begin{equation} Y = f(X) + \epsilon. \label{dataset-using-true-model} \end{equation} $$
On the other hand, we build another model using a specific method such as k-nearest neighbors, which is denoted as $k(X)$.
Why the two models?
Why are we talking about the perfect model and a model using a specific method?
The perfect model $f(X)$ is our ultimate goal, while the model using a specific method $k(X)$ is our effort of approaching the ultimate model.
The bias measures the deficit between $k(X)$ and the perfect model $f(X)$,Types of Errors in Statistical Hypothesis Testinghttps://datumorphism.leima.is/wiki/statistical-hypothesis-testing/type-1-error-and-type-2-error/Fri, 31 May 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/statistical-hypothesis-testing/type-1-error-and-type-2-error/Type I and Type II Errors In statistical hypothesis testing, we always have a null hypothesis $H_0$ which refers to the statement to be tested. We have two possible conclusions from a hypothesis testing,
to accept the hypothesis, that is concluding that $H_0$ is true, to reject the hypothesis, that is concluding that $H_0$ is false. However, it is possible that our conclusion is not correct. There are four possible results.
$H_0$ is True (Ground Truth) $H_0$ is False (Ground Truth) Accept $H_0$ (after hypothesis testing) Correct Type II Error Reject $H_0$ (after hypothesis testing) Type I Error Correct We could tell that there are two types of errors:Amazon CloudWatch Logshttps://datumorphism.leima.is/wiki/tools/awslogs/Mon, 11 Mar 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/tools/awslogs/Why Suppose we have all kinds of pipelines written in different languages, using different tools, and located in different places. It would be frustrating to pull out the logs.
This is why we need a centralized log service, for example cloudwatch.
Sending logs to CloudWatch First of all, send your logs to awslogs. The easies way is to use boto.
Retrieving and Analyzing Logs First of all, we need this: awslogs. With the logs sent to cloudwatch, we then could read out the logs using the following command:
awslogs get etl-tools --start='1d ago' --timestamp --output text | grep error This will print out the logs 1 day ago.t Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/t-distribution/Tue, 22 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/t-distribution/VisualizationConfidence Intervalhttps://datumorphism.leima.is/wiki/statistical-estimation/confidence-interval/Sun, 20 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/statistical-estimation/confidence-interval/We will use upper cases for the abstract variable and lower cases for the actual numbers.
Why is Confidence Interval Needed? Suppose I sample the population multiple times, the mean value $\mu_i$ of the sample is calculated for each sample. It is a good question to ask how different these $\mu_i$ are compared to the true mean $\mu_p$ of the population.
In this article, we would need to specify several notations.
$X$ is the quantity we are measuring. $\bar X$ is the mean of the quantity $X$. Confidence Interval This theorem states that the probability for the true mean $\mu_p$ to fall into a specific range can be calculated usingJargonshttps://datumorphism.leima.is/wiki/statistics/jargons/Sat, 24 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/statistics/jargons/Accuracy and Precision Accuracy: the measurement compared to the truth Precision: variability of repeated measurements; the more precise, the less variations during each measurement. Accurate Inaccurate Precise Close to true value, small variations in each measurement Far from true value, small variations in each measurement Imprecise Close to true value, large variations in each measurement Far from true value, large variations in each measurement Here is an example. Suppose we have a huge population (with true mean $M_0$) and we draw samples from it. For the first time, we have sample $S_1$.Extract, Transform and Loadhttps://datumorphism.leima.is/wiki/data-warehouse/extract-transform-load/Fri, 23 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/data-warehouse/extract-transform-load/ETL Process ETL
ETL
Extract: extract data from sources Transform: transform it to proper format Load: load it to data storage infrastructure E for Extract Should not affect the source system. T for Transform Cleaning Filtering Enriching Splitting Joining L for Load Deal with sync and waitingPartial Differential Equationshttps://datumorphism.leima.is/wiki/dynamical-system/partial-difference-method/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/dynamical-system/partial-difference-method/Forward Time Centered Space For $\frac{d f}{d t} = - v \frac{ d f }{ dx }$, we write down the finite difference form 1
$$ \frac{f(t_{n+1}, x_i ) - f(t_n, x_i)}{ \Delta t } = - v \frac{ f(t_n, x_{i+1}) - f(t_n, x_{i-1}) }{ 2\Delta x }. $$
FTCS is an explicit method and is not stable.
Lax Method Change the term $f(t_n, x_i)$ in FTCS to $( f(t_n, x_{i+1}) + f(t_n, x_{i-1}) )/2$ 1.
Stability condition is
$$ \frac{ \lvert v \rvert \Delta t }{ \Delta x } \leq 1, $$
which is the Courant-Fridriches-Lewy stability criterion.
Staggered Leapfrog $$ \frac{f(t_{n+1}, x_i) - f(t_{n-1}, x_i)}{2 \Delta t} = -v \frac{ f(t_n, x_{i+1} ) - f(t_n, x_{i-1} ) }{ 2\Delta x} $$Basics of Programminghttps://datumorphism.leima.is/wiki/computation/basics-of-programming/Sun, 23 Sep 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/computation/basics-of-programming/Recursive and Iterative Solving problems with iterative and recursive methods are two quite different approaches, somehow, to the same kind of problems.
Here we will calculate the factorial of $n$. We define two functions using the iterative method and the recursive method.
Run the program on Repl.it.
def recursiveFactorial(n): if n == 0: return 1 else: return n * recursiveFactorial(n - 1) def iterativeFactorial(n): ans = 1 i=1 while i <= n: ans = ans * i i=i+1 return ans print(recursiveFactorial(0)) print(iterativeFactorial(0))Basic Node Crawlerhttps://datumorphism.leima.is/wiki/nodecrawler/basic-crawler/Sun, 15 Jul 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/nodecrawler/basic-crawler/Prerequisites Nodejs >= 8.9 Overview A model for a crawler is as follows.
A crawler requests data from the server, while the server responds with some data. Here is a graphic illustration
+----------+ +-----------+ | | HTTP Request | | | +----------------> | | Nodejs | | Servers | | <----------------+ | | | HTTP Response | | +----------+ +-----------+ HTTP Requests For a good introduction of HTTP requests, please refer to this video on youtube: Explained HTTP, HTTPS, SSL/TLS API As for the first step, we need to find which url to request.Autoregressive Modelhttps://datumorphism.leima.is/wiki/time-series/autoregressive-model/Wed, 20 Jun 2018 15:58:49 -0400https://datumorphism.leima.is/wiki/time-series/autoregressive-model/Autoregressive Given a time series ${T^i}$, a simple predictive model can be constructed using an autoregressive model.
$$ \begin{equation} T^t = \sum_{i=1}^p \beta_i T^{t - i} + \beta^t + \beta^0. \end{equation} $$
Such a model is usually called an AR(p) model due to the fact that we are using data back in $p$ steps.
Differential Equation For simplicity we will look at a AR(1) model. Assume the time series has a step size of $dt$, our model can be rewritten as
$$ T^t = \beta_1 T^{t - 1} + \beta^t + \beta^0 $$
which can be rewritten in the following wayUnsupervised Learning: PCAhttps://datumorphism.leima.is/wiki/machine-learning/unsupervised/pca/Fri, 25 May 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/unsupervised/pca/We use the Einstein summation notation in this article. Principal Component Analysis (PCA) is a commonly used trick for dimensionality reduction so that the new features represents most of the variances of the data.
Representations of Dataset In theory, a dataset can be represented by a matrix if we specify the basis. However, the initial given basis is not always the most convinient one. Suppose we find a new set of basis for the dataset, the matrix representation may be simpler and easier to use.
For convenience, we do not distinguish the representation and the abstract dataset in this article.Data Structurehttps://datumorphism.leima.is/wiki/algorithms/data-structure/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/algorithms/data-structure/Dealing with data structure is like dealing with your clothes. Some people randomly drop their clothes somewhere without thinking. But it takes time to retrieve a specific T-shirt. Some people spend more time folding and arranging their clothes. This process makes it easy to find a specific T-shirt. Similar to retrieving clothes, there is always a balance between the computation time (retrieving clothes) and the coding time (folding clothes).
Some Useful Data Structures This section serves as some kind of flashcard keywords. I am using this section to remind myself of the important concepts.
Binary Tree [[Tree]] Data Structure: Tree mind the data structure: here comes the tree ; Binary tree Traverse a tree: Pre-order traversal: parent->left->right In-orer traversal: left->parent->right Post-order traversal: left->right->parent Level-order traversal: top->bottom, by each level from left to right of the whole tree Array Suppose I bought 5 movie tickets for the movie Tenet.The C++ Language: Basicshttps://datumorphism.leima.is/wiki/programming-languages/cpp/basics/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/programming-languages/cpp/basics/Make it Work Apart from the traditional way of running C++ code, Jupyter notebook has a clingkernel that make it possibel to run C++ in a Jupyter notebook. Here is the post: Interactive C++ for HPC.
Concepts Namespace Operators: assignment operators (=,+=,-=,*=,/=,%=), increment/decrement operator (++x,x++,--x,x--), relational operators (>,<,>=,<=,==,!=), logicl operators (&&,||,!), left shift (<<), extration operator (>>, or right shift), understand the operator precedence Variables: variable name starts with underscore or latin letters, Pascal case (PascalCase), Camel case (pascalCase) if/else and Loops: if (condition is true ){ then something } Data Types: string (double quote), character (char, 1 byte ASCII character, using single quote), float (4 bytes, always signed), double (8 bytes, always signed), long double (8 or 16 bytes, always signed), singed or unsigned short or long int (signed long int, unsigned int) Pointers: ampersand (&) accesses the address, pointer is variable thus needs to be declared using asterisk (*), can be declared to be int or double or float or char (int \*pt; int\* pt; int * pt;) Functions: overload, recursion Class: identity, atrributes, method/behavior, access specifiers (private or public or protected, by default it is set to private), instantiation of object (creating object), constructor, destructor, encapsulation, scope resolution operator (TheClassYouNeed::somefunction()), selection operator (dot member selection .The Python Language: Decoratorshttps://datumorphism.leima.is/wiki/programming-languages/python/decorators/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/programming-languages/python/decorators/Functions: first-class objects; can be passed around as arguments.
What that tells us about is that functions can be pass into a function or even returned by a function. For example,
def a_decoration_function( yet_another_function ): def wrapper(): print('Before yet_another_function') yet_another_function() print('After yet_another_function') return wraper def yet_another_function(): print('This is yet_another_function') When we execute a_decoration_function, we will have
Before yet_another_function This is yet_another_function After yet_another_function So a decorator is simply a function that takes a function as an argument, adds some salt to it.
To use the decorator, we simply use @
@a_decoration_function def my_function(): print('This is my function') my_function() This piece of code will return the decorated function.A Physicist's Crash Course on Artificial Neural Networkhttps://datumorphism.leima.is/wiki/machine-learning/neural-networks/physicists-crash-course-neural-network/Sat, 02 May 2015 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/neural-networks/physicists-crash-course-neural-network/What is a Neuron What a neuron does is to response when a stimulation is given. This response could be strong or weak or even null. If I would draw a figure, of this behavior, it looks like this.
Neuron response Using simple single neuron responses, we could compose complicated responses. To achieve that, we study the transformations of the response first.
transformations Artificial Neural Network A simple network is a collection of neurons that response to stimulations, which could be the responses of other neurons.
neural network A given input signal is spreaded onto three different neurons.Workflowshttps://datumorphism.leima.is/awesome/workflows/Mon, 01 Jan 0001 00:00:00 +0000https://datumorphism.leima.is/awesome/workflows/The scope of exploratory data analysis is not universally defined. Some of the contents discussed here may have crossed the line. The whole modeling process is never decoupled anyway. Data wrangling is mostly guided by the exploratory data analysis (EDA). In other words, the data cleaning process should be mostly guided by questions from business and stakeholder or out of curiosity.
There are three key components in EDA.
Clearly state the purpose of this EDA. Are we asking the right question? Does the dataset fit in memory or shall I use distributed preprocessing? Is the dataset good enough to solve the problem?Time Convolutionhttps://datumorphism.leima.is/cards/forecasting/time-convolution/Mon, 28 Nov 2022 00:00:00 +0000https://datumorphism.leima.is/cards/forecasting/time-convolution/The temporal convolution is responsible for capturing temporal patterns in a sequence.
Dilated Temporal Convolution Unit8 has a nice blog about temporal convolution and dilated temporal convolution1. In this
Convolutions Using Fourier Transform Convolution and Fourier transform Dilated Convolution For a convolution $$ f*h(x) = \sum_{s+t=x} f(s) h(t), $$ the dilated version of it is1 $$ f*_l h(x) = \sum_{s+t*l=x} f(s) h(t), $$ where $l$ is the dilation factor. Yu2015 Yu F, Koltun V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv [cs.CV]. 2015. Available: http://arxiv.org/abs/1511.07122  ↩︎ Inception A good convolutional network should capture both short-term and long-term patterns in the time series data.Evaluating Time Series Modelshttps://datumorphism.leima.is/wiki/forecasting/evalutate-time-series-models/Fri, 08 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/forecasting/evalutate-time-series-models/Evaluating time series models is usually different from most other machine learning tasks as we usually don’t have i.i.d. data.
Out-of-sample Out-of-Sample with Sliding Window
If the sliding window size is 1, then we have the simplest out-of-sample holdout scenario.
Prequential Prequential with Gap
Prequential with Growing Train
Prequential with Sliding Blocks
Cross-validation Cross-validation
Cross-validation with Neighbor removedConformal Predictionhttps://datumorphism.leima.is/wiki/statistical-estimation/conformal-prediction/Fri, 01 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/statistical-estimation/conformal-prediction/Conformal prediction is a method to predict a consistent confidence interval in an on-line setting. The algorithms is following the [[Neyman-Pearson hypothesis testing]] Neyman-Pearson Theory The Neyman-Pearson hypothesis testing tests two hypothesis, hypothesis $H$, and an alternative hypothesis $H_A$. Neyman-Pearson Lemma The Neyman-Pearson Lemma is an very intuitive lemma to understand how to choose a hypothesis. The lecture notes from PennState is a very good read on this topic1. An example For simplicity, we assume that there exists a test statistic $T$ and $T$ can be used to measure how likely the hypothesis $H$ is true, e.g., the hypothesis $H$ is false, corresponds to $T$ … framework, thus providing solid theoretical support for the predicted region.Graphs Spectral Methodshttps://datumorphism.leima.is/wiki/graph/basics/graph-spectral-methods/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/graph/basics/graph-spectral-methods/The [[Ratio Cut]] Graph Cuts Cut For a subset of nodes $\mathcal A\subset \mathcal V$, the rest of nodes can be denoted as $\bar {\mathcal A} = \mathcal V \setminus \mathcal A$. In other words, $\mathcal A \cup \bar {\mathcal A} = \mathcal V$ and $\mathcal A \cap \bar {\mathcal A} = \emptyset$. That being said, the nodes can be partitioned into two subsets, $\mathcal A$ and $\bar {\mathcal A}$. The cut of this partition is defined as the total number of edges between them, $$ \operatorname{Cut} \left( \mathcal A, … is closely related to the [[Graph Laplacians]] Graph Laplacians Laplacian is a useful representation of graphs.Contrastive Model: Instance-Instancehttps://datumorphism.leima.is/wiki/machine-learning/contrastive-models/instance-instance/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/contrastive-models/instance-instance/It was discovered that the success of [[mutual information based contrastive learning]] Contrastive Model: Context-Instance In contrastive methods, we can manipulate the data to create data entries and infer the changes using a model. These methods are models that “predict relative position”1. Common tricks are shuffling image sections like jigsaw, and rotate the image. We can also adjust the model to discriminate the similarities and differences. For example, to generate contrast, we can also use [[Mutual Information]] Mutual Information Mutual information is defined as $$ I(X;Y) = \mathbb E_{p_{XY}} … is more related to the encoder architecture and the negative sampling strategy1.Generative Model: Normalizing Flowhttps://datumorphism.leima.is/wiki/machine-learning/generative-models/flow/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/generative-models/flow/Normalizing flow is a method to convert a complicated distribution $p(x)$ to a simpler distribution $\tilde p(z)$ by building up a map $z=f(y)$ for the variable $x$ to $z$. The relations between the two distributions is established using the conservation law for distributions, $\int p(x) \mathrm d x = \int \tilde p (z) \mathrm d z = 1$. One could imagine that changing the variable also brings in the Jacobian.
Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218
Normalizing Flows: An Introduction and Review of Current Methods To generate complicated distributions step by step from a simple and interpretable distribution.infoGANhttps://datumorphism.leima.is/wiki/machine-learning/adversarial-models/infogan/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/adversarial-models/infogan/In GAN, the latent space input is usually random noise, e.g., Gaussian noise. The objective of [[GAN]] GAN The task of GAN is to generate features $X$ from some noise $\xi$ and class labels $Y$, $$\xi, Y \to X.$$ Many different GANs are proposed. Vanilla GAN has a simple structure with a single discriminator and a single generator. It uses the minmax game setup. However, it is not stable to use minmax game to train a GAN model. WassersteinGAN was proposed to solve the stability problem during training1. More advanced GANs like BiGAN and ALI have more complex structures. Vanilla GAN Minmax Game … is a very generic one.Logistic Regressionhttps://datumorphism.leima.is/wiki/machine-learning/linear/logistic-regression/Thu, 27 May 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/linear/logistic-regression/In a classification problem, given a list of features values $x$ and their corresponding classes $\{c_i\}$, the posterior for of the classes, aka conditional probability of the classes, is
$$ p(C=c_i\mid X=x). $$
Likelihood
The likelihood of the data is
$$ p(X=x\mid C=c_i). $$
Logistic Regression for Two Classes For two classes, the simplest model for the posterior is a linear model,
$$ \log \frac{p(C=c_1\mid X=x) }{p(C=c_2\mid X=x)} = \beta_0 + \beta_1 \cdot x, $$
which is equivalent to
$$ p(C=c_1\mid X=x) = \exp\left(\beta_0 + \beta_1 \cdot x\right) p(C=c_2\mid X=x) . $$
Why
The reason that we proposing a linear model for the quantityVC Dimensionhttps://datumorphism.leima.is/wiki/learning-theory/vc-dimension/Sun, 21 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/learning-theory/vc-dimension/Two of the key elements in a learning problem are:
a set of hypothesis $\mathcal H$, and a set of data samples $\mathcal S$. $\mathcal H$
Inside $\mathcal H$, we have a lot of hypotheses, for example, $\mathcal h$. Given some input, e.g., $x_1$ and $x_2$, we can produce some outputs, e.g., $h(x_1)$ and $h(x_2)$. $\mathcal S$
A sample $\mathcal S$ is a fair sample drawn from all the possible inputs $\mathcal X$, where $\mathcal X$ is called the input space. A dataset $\mathcal S$ can be used as a probe of the hypothesis set $\mathcal H$.Measures of Generalizabilityhttps://datumorphism.leima.is/wiki/model-selection/measures-of-generalizability/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/model-selection/measures-of-generalizability/To measure the generalization, we define a generalization error,
$$ \begin{align} \mathcal G = \mathcal L_{P}(\hat f) - \mathcal L_E(\hat f), \end{align} $$
where $\mathcal L_{P}$ is the population loss, $\mathcal L_E$ is the empirical loss, and $\hat f$ is our model by minimizing the empirical loss.
However, we do not know the actual joint probability $p(x, y)$ of our dataset $\{x_i, y_i\}$. Thus the population loss is not known. In machine learning, we usually use [[cross validation]] Cross Validation Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.Random Foresthttps://datumorphism.leima.is/wiki/machine-learning/tree-based/random-forest/Wed, 25 Dec 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/tree-based/random-forest/Random forest is an ensemble method based on decision trees. Instead of using one decision tree and model on all the features, the decision tree method can model on a random set of features (feature subspace) using many decision trees and make decisions by democratizing the trees.
Given a proper dataset $\mathscr D(\mathbf X, \mathbf y)$, the ensemble of trees is denoted as ${f_i(\mathbf X)}$, will predict an ensemble of results. There are several key ideas in random forests.
Are the predicted results representative? How to democratize the ensemble of results? What determines the quality of the predictions? Margin, Strength, and Correlations The margin of the model, the strength of the trees, and the correlation between the trees are crucial to answer the questions.Predictions Using Time Series Datahttps://datumorphism.leima.is/wiki/time-series/predictions-time-series-data/Fri, 21 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/time-series/predictions-time-series-data/General Phenological Model for Seasonality In business, time series data $f(t)$ usually carries information about trend $g(t)$ ($g$ is used since trend is usually growth), seasonalities (periodical effects) $p(t)$, holiday effects (structural effects) $s(t)$, etc. We will decompose a time series $f(t)$ into four components
$$ \begin{equation} f(t) = g(t) + p(t) + s(t) + \epsilon(t). \end{equation} $$
To train a model for the predictions, we need to write down the exact models of these three predictable components.Tensor Factorizationhttps://datumorphism.leima.is/wiki/machine-learning/factorization/tensor-factorization/Mon, 17 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/factorization/tensor-factorization/Tensors We will be talking about tensors but we will skip the introduction to tensor for now.
In this article, we follow a commonly used convention for tensors in physics, the abstract index notation. We will denote tensors as $T^{ab\cdots}_ {\phantom{ab\cdots}cd\cdots}$, where the latin indices such as $^{a}$ are simply a placebo for the slot for this “tensor machine”. For a given basis (coordinate system), we can write down the components of this tensor $T^{\alpha\beta\cdots} _ {\phantom{\alpha\beta\cdots}\gamma\delta\cdots}$.
Okay, But Why
What is usually seen in blog posts is the use of component forms of tensors, $T^{\alpha\beta\cdots}_{\phantom{\alpha\beta\cdots}\gamma\delta\cdots}$. Those are the numbers for a given basis.Anscombe's quartethttps://datumorphism.leima.is/wiki/data-visualization/anscombes-quartet/Mon, 18 Mar 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/data-visualization/anscombes-quartet/Anscombe’s Quartet Anscombe’s quartet is a brilliant idea that shows the importance and convenience of visual representation of data.
Anscombe’s quartet has four datasets. The values of each dataset are shown below.
x1 = [10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5] y1 = [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68] x2 = [10.0, 8.0, 13.0, 9.0, 11.0, 14.0, 6.0, 4.0, 12.0, 7.0, 5.0] y2 = [9.14, 8.14, 8.74, 8.77, 9.26, 8.1, 6.13, 3.1, 9.13, 7.26, 4.74] x3 = [10.0, 8.0, 13.0, 9.0, 11.0, 14.0, 6.0, 4.0, 12.0, 7.0, 5.0], y3 = [7.46, 6.OLAP Operationshttps://datumorphism.leima.is/wiki/data-warehouse/olap-operations/Fri, 23 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/data-warehouse/olap-operations/Roll-up or Drill-up The word ‘up’ in the names refers to going up in concept hierarchies.
For example, we would like to know the revenue of the whole year. However, the record of data is
Date Revenue 2018-01-01 1023 2018-01-02 934 … … 2018-12-30 1244 2018-12-31 1302 Roll-up is performed by summing up everything of the column revenue. It gives us the revenue of the whole year. Monthly and quarterly roll-up is also straightforward.Finite Element Methodhttps://datumorphism.leima.is/wiki/dynamical-system/finite-element-method/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/dynamical-system/finite-element-method/Differential Equations and Boundary Conditions Two Types of Boundary Conditions As an example, we have a partial differential equation
$$ \frac{d^2u}{dx^2} + f = 0, $$
which describes a 1D problem.
Dirichlet boundary condition: specify values for $u$, such as $u(0)=u_0$ and $u(L)=u_L$; Neumann boundary condition: specifiy values for $u_{,x}$. If we have only Neumann boundary condition, the solution is not unique. One example for it is tossing a bar, which can have both Neumann BC at both ends but it is moving.
Example Problems Elasticity Problem We consider the displacement $u(x)$ at each space coordinate $x$ of a elastic bar under some external force.Chi-square Correlation Test for Nominal Datahttps://datumorphism.leima.is/wiki/statistics/correlation-analysis-chi-square/Sun, 18 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/statistics/correlation-analysis-chi-square/In this article, we will discuss the chi-square correlation test for detecting correlations between two series.
Steps Find out all the possible values of the two nominal series A and B; Count the co-occurrences of the combinations (A, B); Calculate the expected co-occurrences of the combinations (A, B); Calculate chi-square; Determine whether the hypothesis can be rejected. Define the Series Suppose we are analyzing two series A and B. Series A can take values $a_1$ and $a_2$, while series B can take values $b_1$ , $b_2$ and $b_3$.
$$ \begin{align} A &:= \{a1, a2\} \\ B &:= \{b1,b2,b3\} \end{align} $$Basics of Networkhttps://datumorphism.leima.is/wiki/computation/basics-of-network/Sun, 23 Sep 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/computation/basics-of-network/HTTP Keywords Hyper Text Transfer Protocal: deliver hyper text from server to local browser etc. Based on TCP/IP Current version: HTTP/2 Server - Client Client can request through GET, HEAD, POST, PUT, DELETE, TRACE, OPTIONS, CONNECT, PATCH. Transfer anything defined by Content-Type Connectionless Protocol: doesn’t maintain the connection all the time Stateless protocal: A very nice explanation URL Keywords Uniform Resource Locator Interpret each part of this URL: http://abc.com:8000/folder/file.html#title1location?param1=123&param2=234 No limits on length of URL by HTTP itself. However, some servers or clients do set limits. Difference between URI and URL and URN: The Difference Between URLs and URIs checkout the Venn diagram.Unsupervised Learning: SVMhttps://datumorphism.leima.is/wiki/machine-learning/unsupervised/svm/Fri, 17 Aug 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/unsupervised/svm/SVM is calculating a hyperplane to separate the data points into groups according to the label.
Hyperplane A hyperplane is defined to be of the following form
$$ \begin{equation} \boldsymbol{\beta} \cdot \mathbf x = \beta_0. \end{equation} $$
where $\boldsymbol\beta$ is the normal vector to the plane and is required to be constant.
It is straight forward to show that the distance $d$ from an arbitrary point $\mathbf x'$ to the hyperplane is
$$ \begin{equation} d = \boldsymbol\beta \cdot \mathbf x' - \beta_0. \end{equation} $$
A Few Key Concepts in SVM Though the concept of SVM is simple, one might find the algorithm to be quite complicated at first glance.Manage Data Using MongoDBhttps://datumorphism.leima.is/wiki/nodecrawler/manage-data-using-mongodb/Wed, 18 Jul 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/nodecrawler/manage-data-using-mongodb/In most cases, databases makes the management of data quite convenient. In this article, we would scrape data using the code we discussed before but write data into MongoDB.
For installation of MongoDB, please refer to the official documentation.
The Code To write data to MongoDB using Node.js, we choose the package mongojs, which provides almost exactly the standard MongoDB syntax.
To install mongojs,
npm i mongojs --save Here is a module that can write data to MongoDB. We create a file named dao.js and copy/paste the following code into it.
// use mongojs const mongojs = require('mongojs') // connect to the database 'simple_spider' in MongoDB and use collection 'test' const localdb = mongojs('simple_spider', ['test']) // a function that saves data to MongoDB const saveData = (data,cb) => { localdb.The Python Language: Multi-Processinghttps://datumorphism.leima.is/wiki/programming-languages/python/multiprocessing/Thu, 10 May 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/programming-languages/python/multiprocessing/Python has built-in multiprocessing module in its standard library.
One simple example of using the Pool class is the following.
def myfunc(myfuncargs): 'some thing here' with Pool(10) as p: records = p.map(myfunc, myfuncargs) However, there are limitations on this, especially on pickles. Another approach.
from multiprocessing import Pool from multiprocessing.dummy import Pool as ThreadPool with ThreadPool(1) as p: records = p.map(myfunc, myfuncargs) Beware that map function will feed in a list of args to the function. So I have to use p.map(myfunc, [arg]) for one arg.Data Structure: Treehttps://datumorphism.leima.is/wiki/algorithms/data-structure-tree/Tue, 27 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/algorithms/data-structure-tree/mind the data structure: here comes the treeThe C++ Language: Numerical Methodshttps://datumorphism.leima.is/wiki/programming-languages/cpp/numerical/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/programming-languages/cpp/numerical/Modularize The code should be designed to separate physics or model from numerical methods. Speed vectors are convenient but slow. 1 Do not copy arrays if not necessary. The example would be for a function return. Most of the time, we can pass the pointer of an array to the function and update the array itself without copying anything and no return is needed at all. inline function. Use namespace instead of class if no data structure is stored in it. Refs http://en.cppreference.com/w/cpp/container/vector ↩︎GNUPlothttps://datumorphism.leima.is/wiki/tools/gnuplot/Mon, 04 Sep 2017 00:00:00 +0000https://datumorphism.leima.is/wiki/tools/gnuplot/Examples Plot .csv data. Suppose we have data of such.
-0.00999983, 0.99995 -0.0199987, 0.9998 -0.0299955, 0.99955 -0.0399893, 0.9992 -0.0499792, 0.99875 -0.059964, 0.998201 To plot the second column against the first column, we use the using parameter in gnuplot.
gnuplot -e "set terminal png; set datafile separator ',' ; plot 'complex.txt' using 1:2" | imgcat # datafile seperator is not always necessary # imgcat is a script in iterm2 on macBoltzmann Machinehttps://datumorphism.leima.is/wiki/machine-learning/energy-based-model/boltzmann-machine/Sun, 27 Aug 2017 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/energy-based-model/boltzmann-machine/Boltzmann machine is much like a spin glass model in physics. In short words, Boltzmann machine is a machine that has nodes that can take values, and the nodes are connected through some weight. It is just like any other neural nets but with complications and theoretical implications.
Boltzmann machine is usually used as a generative model.
Boltzmann Machine and Physics To obtain a good understanding of Boltzmann machine for a physicist, we begin with Ising model. We construct a system of neurons ${ s_i}$ which can take values of 1 or -1, where each pair of them $s_i$ and $s_j$ is connected by weight $J_{ij}$.Mix-hop Propagation in GNNhttps://datumorphism.leima.is/cards/forecasting/gnn-mix-hop-propagation/Mon, 28 Nov 2022 00:00:00 +0000https://datumorphism.leima.is/cards/forecasting/gnn-mix-hop-propagation/The mix-hop propagation layer has two steps1:
information propagation step: $$ \mathbf H^{(k)} = \beta \mathbf H_{in} + (1-\beta)\mathbf L \mathbf H^{(k-1)}, $$
where $\mathbf L= (1+ \operatorname{A}) (\mathbf A + \mathbf I)$. This convolution step tries to disentangle the correlation between the nodes. information selection step: $$ \mathbf H_{out} = \sum_k \mathbf H^{(k)} \mathbf W^{(k)}. $$
See Fig 4 in the paper1.
Wu2020 Wu Z, Pan S, Long G, Jiang J, Chang X, Zhang C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2005.11650  ↩︎Neyman-Pearson Theoryhttps://datumorphism.leima.is/wiki/statistical-hypothesis-testing/neyman-pearson-theory/Sat, 02 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/statistical-hypothesis-testing/neyman-pearson-theory/The Neyman-Pearson hypothesis testing tests two hypothesis, hypothesis $H$, and an alternative hypothesis $H_A$.
Neyman-Pearson Lemma The Neyman-Pearson Lemma is an very intuitive lemma to understand how to choose a hypothesis. The lecture notes from PennState is a very good read on this topic1.
An example For simplicity, we assume that there exists a test statistic $T$ and $T$ can be used to measure how likely the hypothesis $H$ is true, e.g., the hypothesis $H$ is false, corresponds to $T$ being small.
The reference from Shafer2007 assumes a random variable $T$ to be large if the hypothesis $H$ is false[^Shafer2007]. One example is the ratio of likelihood2,Solving Problems on Graphhttps://datumorphism.leima.is/wiki/graph/basics/ml-problems-on-graph/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/graph/basics/ml-problems-on-graph/Graphs can be used in many problem and there are many possible problems on graphs. We will mention a few popular problems on graphs12.
Node Classification Is the user in black a bot or a normal user?Created based on the text in Hamilton2020
Given graph that has incomplete attribute labeling of the nodes, predict the attributes on the nodes.
The following concepts can be used to classify nodes.
[[Homophily]] Homophily on Graph Homophily is the principle that a contact between similar people occurs at ahigher rate than among dissimilar people – McPherson20011 McPherson2001 McPherson M, Smith-Lovin L, Cook JM.Contrastive Predictive Codinghttps://datumorphism.leima.is/wiki/machine-learning/contrastive-models/contrastive-predictive-codeing/Wed, 08 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/contrastive-models/contrastive-predictive-codeing/Contrastive Predictive Coding, aka CPC, is an autoregressive model combined with InfoNCE loss1.
There are two key ideas in CPC:
Autoregressive models in latent space, and InfoNCE loss that combines mutual information and [[NCE]] Noise Contrastive Estimation: NCE Noise contrastive estimation (NCE) objective function is1 $$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ C(x, x^{+})}{ C(x,x^{+}) + C(x,x^{-}) } \right], $$ where $x^{+}$ represents data similar to $x$, $x^{-}$ represents data dissimilar to $x$, $C(\cdot, \cdot)$ is a function to compute the similarities. For example, we can use $$ C(x, x^{+}) = e^{ f(x)^T f(x^{+}) }, $$ so that the objective function becomes $$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ e^{ … .Deep Infomaxhttps://datumorphism.leima.is/wiki/machine-learning/contrastive-models/deep-infomax/Wed, 08 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/contrastive-models/deep-infomax/Max Global Mutual Information
Why not just use the global mutual information of the input and encoder output as the objective?
… maximizing MI between the complete input and the encoder output (i.e.,globalMI) is ofteninsufficient for learning useful representations.
– Devon et al[^Devon2018]
[[Mutual information]] Mutual Information Mutual information is defined as $$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}. $$ In the case that $X$ and $Y$ are independent variables, we have $P_{XY} = P_X P_Y$, thus $I(X;Y) = 0$. This makes sense as there would be no “mutual” information if the two variables are independent of each other.Generative Model: Auto-Encoderhttps://datumorphism.leima.is/wiki/machine-learning/generative-models/autoencoder/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/generative-models/autoencoder/Autoencoders (AE) are machines that encodes inputs into a compact latent space.
The simplest auto-encoder is rather easy to understand.
The loss can be chosen based on the demand, e.g., cross entropy for binary labels.
Notation: dot ($\cdot$)
We use a single vertically centered dot, i.e., $\cdot$, to indicate that the function or machine can take in arguments. A simple autoencoder can be achieved using two neural nets, e.g.,
$$ \begin{align} {\color{green}h} &= {\color{blue}g}{\color{blue}(}{\color{blue}b} + {\color{blue}w} x{\color{blue})} \ \hat x &= {\color{red}\sigma}{\color{red}(c} + {\color{red}v} {\color{green}h}{\color{red})}, \end{align} $$
where in this simple example,
${\color{blue}g(b + w \cdot )}$ is the encoder, and ${\color{red}\sigma(c + v \cdot )}$ is the decoder.Restricted Boltzmann Machinehttps://datumorphism.leima.is/wiki/machine-learning/energy-based-model/restricted-boltzmann-machine/Fri, 11 Jun 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/energy-based-model/restricted-boltzmann-machine/Latent variables introduce extra correlations between the nodes in a network. Introducing hidden units can also help us remove the direct connection between some nodes in a Boltzmann machine and create a restricted Boltzmann machine. A restricted Boltzmann machine requires less computation while having some expressing power.
Given Ising like interactions between the nodes, flipping node V1 is likely to also flip node V2 as they are connected through hidden unit H1. They are correlated. Removing the hidden unit leaves us two uncorrelated units.
The Ising Model Given a Ising-like energy function (c.f. [[MaxEnt Model]] MaxEnt Model Maximum Entropy models makes least assumption about the data )Deep Autoregressive Networkhttps://datumorphism.leima.is/wiki/machine-learning/neural-networks/deep-autoregressive-networks/Mon, 15 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/neural-networks/deep-autoregressive-networks/There are two levels of autoregressiveness in the DARN network:
Inlayer autoregressive connections of the nodes, Intralayer autoregressive connections of nodes. The network is trained on MDL loss.Wavelet Transformhttps://datumorphism.leima.is/wiki/time-series/wavelets/Mon, 07 Dec 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/time-series/wavelets/In general, given a complete set of function $\psi(x; \tilde x)$, we can decompose a function $F(\tilde x)$
$$ F(\tilde x) = \int f(x) \psi(x;\tilde x) dx. $$
The choice of $\psi(x;\tilde x)$ gives us different properties.
Fourier Transform Fourier transform is good for stationary analysis since time is not involved in $F(\omega)$.
$$ F(\omega) = \int_{-\infty}^{\infty} f(t) e^{-i \omega t} dt $$
Short-time Fourier Transform STFT is a Fourier transform with a moving time window $\tau$,
$$ F(\tau,\omega) = \int_{-\infty}^{\infty} f(t) w(t - \tau) e^{-i\omega t} dt. $$
Moving $\tau$ gives us the ability to investigate Fourier components at different time segments (assuming the window function $w(t-\tau)$ is a step function).Parsimony of Modelshttps://datumorphism.leima.is/wiki/model-selection/parsimony-of-models/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/wiki/model-selection/parsimony-of-models/For models with a lot of parameters, the goodness-of-fit is very likely to be very high. However, it is also likely to generalize bad. So we need measure of generalizability
Here parsinomy gives us a few advantages.
easy to perceive better generalizationsTerminalhttps://datumorphism.leima.is/wiki/tools/terminal/Tue, 31 Dec 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/tools/terminal/Navigating Some tips to help data scientist navigate faster in terminal.
pushd, popd and dirs pushd to register and change directories: pushd folder_name will change current directory to folder_name and register the folder folder_name in our stack. If no folder name is passed onto the command, it will be default to $HOME folder. popd to go to the last directory in the stack and remove it from the stack. In this example, popd will change the current working directory to folder_name. dirs will list all working directories registered in the stack. The current working directory will always be in the stack.Data Storagehttps://datumorphism.leima.is/wiki/data-warehouse/data-storage/Fri, 23 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/data-warehouse/data-storage/tl;dr: Use type safe formats such as HDF5 or parquet
HDF5 BCOLZ <http://bcolz.blosc.org/en/latest/>_ : not designed for multidimentional data. Zarr <https://github.com/alimanfoo/zarr>_ : works with multidimensional data and also parallel computing. Blaze ecosystem <http://blaze.pydata.org/>_ A article that compares HDF5, BCOLZ, and Zarr: To HDF5 and beyond
I also recommend pandas. It is a python module that works very well with data. It even loads HDF5 out of box.Bin Size of Histogramhttps://datumorphism.leima.is/wiki/data-visualization/histogram-bin-size/Thu, 22 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/data-visualization/histogram-bin-size/[[Histograms]] Histogram Suppose we check out the burger prices at the stores of Han im Glück, we get a list of numbers. We can arrange the numbers into bins of prices. For example, we can count the number stores that have a price between 10 to 11 euros. are good for understanding the distribution of your data.
The Bin Size Problem As an example, we will use the following series as an example.
[1.45,2.20,0.75,1.23,1.25,1.25,3.09,1.99,2.00,0.78,1.32,2.25,3.15,3.85,0.52,0.99,1.38,1.75,1.21,1.75] If we use bin size 1, we get this spiky chart and it is not so informing.
We could also set bin size to 2.Correlation Coefficient and Covariance for Numeric Datahttps://datumorphism.leima.is/wiki/statistics/correlation-coefficient/Sun, 18 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/statistics/correlation-coefficient/Covariances Correlation coefficient is also known as the Pearson’s product moment coefficient. Review of Standard Deviation For a series of data A, we have the standard deviations
$$ \sigma_A = \sqrt{ \frac{ \sum (a_i - \bar A)^2 }{ n } }, $$
where $n$ is the number of elements in series A.
The standard deviation is very easy to understand. It is basically the average Eucleadian distance between the data points and the average value. In this article, we will take another point of view.
Now imagine we have two series $(a_i - \bar A)$ and $(a_j - \bar A)$.Basics of Databasehttps://datumorphism.leima.is/wiki/computation/basics-of-database/Wed, 03 Oct 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/computation/basics-of-database/NoSQL NoSQL = Not only SQL. The four main types of NoSQL databases are
Key-value store: Amazon Dynamo, memcached, Amazon SimpleDB Column-orient store: Google BigTable, Cassandra Graph database: Neo4j, VertexDB Document database: [[MongoDB]] Basics of MongoDB MongoDB is a document based database Object database: ZODB Database Operations Relations Union: $A\cup B$ Intersection: $A\cap B$ $A - B$ Cartesian Product: $A \times B$ Query Union in database: will combine the data with matching common columns.
NaturalJoin and EquiJoin:
--EquiJoin where we specify what condition is used to join SELECT * FROM table1 JOIN table2 ON (table1.Restrictions of Websiteshttps://datumorphism.leima.is/wiki/nodecrawler/restrictions/Thu, 19 Jul 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/nodecrawler/restrictions/Beware that scraping data off websites is neither always allowed nor as easy as a few lines of code. The preceding articles enable you to scrape many data, however, man websites have counter measures. In this article, we will be dealing with some of the common ones.
Request Frequency Some websites have limitations on the frequency of API requests. The solution to this is simply a brief pause after each request. In Node.js, the function setInterval enables this.
// ... require packages here // define the function fetch to get data const fetch = (aid) => superagent .get('https://api.bilibili.com/x/web-interface/archive/stat') .query({ aid:aid }) .Data Structure: Graphhttps://datumorphism.leima.is/wiki/algorithms/data-structure-graph/Tue, 27 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/algorithms/data-structure-graph/mind the data structure: here comes the graphThe Python Language: Performancehttps://datumorphism.leima.is/wiki/programming-languages/python/performance/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/programming-languages/python/performance/Read the references for performance.
The message:
Use comprehensions Use generatorsGraph Structure Learning in GNNhttps://datumorphism.leima.is/cards/forecasting/gnn-graph-structure-learning/Mon, 28 Nov 2022 00:00:00 +0000https://datumorphism.leima.is/cards/forecasting/gnn-graph-structure-learning/We extract the definitions in Wu et al., 20201. Given node embeddings $\mathbf E_i$1,
$$ \begin{align} \mathbf M_i &= \tanh(\alpha \mathbf E_i \Theta_i) \\ \mathbf A &= \operatorname{ReLU}(\tanh(\alpha (\mathbf M_1 \mathbf M_2^T - \mathbf M_2\mathbf M_1^T))), \end{align} $$
The author also proposed sparse requirement and only take the top-$k$ largest elements in $A$.
Wu2020 Wu Z, Pan S, Long G, Jiang J, Chang X, Zhang C. Connecting the Dots: Multivariate Time Series Forecasting with Graph Neural Networks. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2005.11650  ↩︎DeepARhttps://datumorphism.leima.is/wiki/forecasting/deepar/Sat, 09 Jul 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/forecasting/deepar/Focus onInformation Bottleneckhttps://datumorphism.leima.is/wiki/learning-theory/information-bottleneck/Sat, 30 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/learning-theory/information-bottleneck/Information Bottleneck In a [[induction-deduction framework]] Induction, Deduction, and Transduction , for a given training dataset
$$ \{X, Y\}, $$
a prediction Markov chain1
$$ X \to \hat X \to Y, $$
where $\hat X$ is supposed to be the minimal sufficient statistics of $X$. $\hat X$ is the minimal data that can still represent the relation between $X$ and $Y$, i.e., $I(X;Y)$, the [[mutual information]] Mutual Information Mutual information is defined as $$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}. $$ In the case that $X$ and $Y$ are independent variables, we have $P_{XY} = P_X P_Y$, thus $I(X;Y) = 0$.State Space Modelshttps://datumorphism.leima.is/wiki/time-series/state-space-models/Sun, 27 Feb 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/time-series/state-space-models/State space model is an important category of model for sequential data. Through simple assumptions, state space models can achieve quite complicated distributions.
To model a sequence, we can use the joint probability of all the nodes,
$$ p(x_1, x_2, \cdots, x_N), $$
where $x_i$ are the nodes in the sequence.
Orders We can introduce different order of dependencies on the past.
The simplest model for the sequence is assuming i.i.d..
Zeroth OrderEach node is independent of each other
To model the dependencies in the sequence, we can assume a node depends on the previous nodes. The first-order model assume that node $x_{i+1}$ only depends on node $x_i$.The Python Language: Packaginghttps://datumorphism.leima.is/wiki/programming-languages/python/packaging/Fri, 27 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/programming-languages/python/packaging/The official documentation has pages about building python packages1. Torborg also compiled a series of pages and examples about building a package2. In this note, I only provide some tips.
Private Python Packages We can easily setup a private pypi service (e.g., pypicloud).
Install Packages from Private Pypi To install packages inline, use
pip install -i https://$PYPI_USER:$PYPI_PWD@your.pypi.url/simple/ durst==0.0.5 To install packages from requirements.txt use,
pip install -r requirements.txt --trusted-host https://$PYPI_USER:$PYPI_PWD@your.pypi.url/simple --extra-index-url https://$PYPI_USER:$PYPI_PWD@your.pypi.url/simple Publishing Packages Using GitHub Actions We can easily publish python packages using GitHub Actions. Here is an example.
name: Publish Package to Private Pypi on: [push] jobs: build-n-publish: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v2 - name: Set up Python uses: actions/setup-python@v2 with: python-version: 3.Variational Auto-Encoderhttps://datumorphism.leima.is/wiki/machine-learning/generative-models/variational-autoencoder/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/generative-models/variational-autoencoder/In an inference problem, $p(z\vert x)$, which is used to infer $z$ from $x$.
$$ p(z\vert x) = \frac{p(x, z)}{p(x)}. $$
For example, we have an observable $x$ and a latent space $z$, we would like to find a good latent space for the observable $x$. However, $p(x)$ is something we don’t really know. We would like to use some simpler quantities to help us inferring $z$ from $x$ or generating $x$ from $z$.
Now we introduce a simple distribution $q(z\vert x)$. We want to make sure this $q(z\vert x)$ is doing a good job of replacing $p(z\vert x)$, i.e., minimizing the [[KL divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions ,MDL and Neural Networkshttps://datumorphism.leima.is/wiki/model-selection/mdl-and-neural-networks/Sun, 14 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/model-selection/mdl-and-neural-networks/Minimum Description Length ( [[MDL]] Minimum Description Length MDL is a measure of how well a model compresses data by minimizing the combined cost of the description of the model and the misfit. ) can be used to construct a concise network. A fully connected network has great expressing power but it is easily overfitting.
One strategy is to apply constraints to the networks:
Limit the connections; Shared weights in subgroups of the network; Constrain the weights using some probability distributions. By minimizing the MDL of the network and the misfits on the data, we can build a concise network.Histogramhttps://datumorphism.leima.is/wiki/data-visualization/histogram/Tue, 20 Aug 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/data-visualization/histogram/Suppose we check out the burger prices at the stores of Han im Glück, we get a list of numbers. We can arrange the numbers into bins of prices. For example, we can count the number stores that have a price between 10 to 11 euros.Basics of SQLhttps://datumorphism.leima.is/wiki/computation/basics-of-sql/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/computation/basics-of-sql/Adding a new field to data:
Relational: requires a new column Non-Relational: just add the field to one single document, thus can be easily decentralized. Basics and Background SQL: Structured Query Language
Relational Database:
usually in tables rows are called records columns are certain types of data. Data types of rows are specified: INTEGER TEXT DATE REAL, real numbers NULL … RDBMS: Relational Database Management System, most RDBMS use SQL as the query language. SQLite is one of the RDBMS.
SQLite: open source and minimal MySQL: powerful and popular, also open source, controlled by Oracle, not really scalable.Normalization Methods for Numeric Datahttps://datumorphism.leima.is/wiki/statistics/normalization-methods/Sun, 18 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/statistics/normalization-methods/Normalization of data is critical for statistical analysis and feature engineering.
Min-max Normalization This method is linear and straightforward.
Suppose we are analyzing series A, with elements $a_i$. We already know the min and max of the series, $a_{min}$ and $a_{max}$.
Now we would like to normalize the series to be within the range $[a_{min}', a_{max}']$. We simply solve the value of $a' _ i$ in $$ \frac{(a'i - a{min}')}{ ( a'{max} - a'{min} ) } = \frac{(a_i - a_{min})}{ ( a_{max} - a_{min} ) }, $$ where everything on the right hand side is known and $a_{min}'$ and $a_{max}'$ are chosen as the new min and max to be scaled to.Optimizationhttps://datumorphism.leima.is/wiki/nodecrawler/optimization/Thu, 19 Jul 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/nodecrawler/optimization/In this article, we will be optimizing the crawler to get better performance.
Batch Jobs In the article about using MongoDB as data storage, we write the data to database whenever we get it. In practice, this is not efficient at all. Here comes the batch jobs. It would be much better if one write to database with batch jobs.
If you recall, the code we used to write to database is
// ...other code localdb.test.save(data, (err, res)=>{ // do something }) The function save takes in not only one entry of document but an array of documents:
const array = [] for(let i = INI_ID ; i < MAX_ID; i++){ // fetch data from website const data = fetchData(i) array.Githttps://datumorphism.leima.is/wiki/tools/git/Wed, 22 Jun 2016 00:00:00 +0000https://datumorphism.leima.is/wiki/tools/git/Git Services GitHub Bitbucket GitLab Using Git with GUI There are huge amounts of git commands! There are also a lot of GUIs if you don’t like command line.
GitHub Desktop GitKraken SourceTree … Useful Commands To check all the commits related to a file, use git log -u. Try out git log -g before determining which reflog to deal with. To compare the changes with the last commit, use git diff --cached HEAD~1. Useful Alias A very useful article here: Must Have Git Aliases: Advanced Examples.
git config --global alias.unstage 'reset HEAD --' git config --global alias.Hidden Markov Modelhttps://datumorphism.leima.is/wiki/time-series/hidden-markov-model/Sun, 27 Feb 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/time-series/hidden-markov-model/The hidden Markov model, HMM, is a type of [[State Space Models]] State Space Models The state space model is an important category of models for sequential data such as time series 1.
HMM Bishop2006 Christpher M. Bishop. Pattern Recognition and Machine Learning. Springer-Verlag New York; 2006.  ↩︎Some ML Workflow Frameworkshttps://datumorphism.leima.is/wiki/tools/ml-flow-frameworks/Wed, 13 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/tools/ml-flow-frameworks/Metaflow Docs
A framework for jupyter notebook data scientists.
Work locally on notebooks. Python environment management using conda. Work in the cloud with Sagemaker. Tasks Methods Comments Code Scripts/Jupyter Notebook Datastore local + S3 metaflow.S3 Compute local + AWS Batch Metadata metaflow service Metadata specifies flow executions: Flows, Runs, Steps, Tasks, and Artifacts. Scheduling AWS Step Functions Deployment AWS Demo from metaflow import FlowSpec, step class BranchFlow(FlowSpec): @step def start(self): self.next(self.a, self.b) @step def a(self): self.Boxplothttps://datumorphism.leima.is/wiki/data-visualization/boxplots/Tue, 20 Aug 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/data-visualization/boxplots/Example The Whiskers in Boxplot They are the outlier data points.
Outliers are determined using the interquatile range (IQR, i.e., 25 percentile to 75 percentile.). We usually the lowest data point within 1.5 IQR range below the 25 percentile or the data point within 1.5 IQR range above the 75 percentile.Linear Regressionhttps://datumorphism.leima.is/wiki/statistics/linear-regression/Tue, 01 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/statistics/linear-regression/In this article, we will use the Einstein summation convention. For example, $$ X_{ij}\beta_ j $$ is equivalent to $$ \sum_j X_{ij}\beta_ j $$ In statistics, we have at least three categories of quantities:
data and labels abstract theoretical quantities parameters and predictions of models The convention is that quantities with $\hat {}$ are the model quantities. Sometimes we do not distinguish the abstract theoretical quantities and model quantities.
If it is necessary to use different notations for the abstract theoretical quantities and the model quantities, we would use bold symbols ($\mathbf Y$) or latin sub/super indices ( $Y_a$ ) for theoretical quantities and greek letters ( $Y_\alpha$ ) for model quantities.Basics of MongoDBhttps://datumorphism.leima.is/wiki/computation/basics-of-mongodb/Wed, 03 Oct 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/computation/basics-of-mongodb/This MongoDB Cheatsheet is my best friend.
MongoDB Concepts Documents Collections: just like tables in SQL. Database MongoShell Some examples:
// show the databases show dbs // show collections show collections //set any database to current database use database_name // insert entry db.database_name.insert( an_object_2_be_the_entry ) // read document db.database_name.findOne({'some_field':'value_of_field'}) db.database_name.fidn() // prettify db.database_name.find().pretty()Time Series Data Augmentationhttps://datumorphism.leima.is/wiki/time-series/data-augmentation/Mon, 27 Jun 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/time-series/data-augmentation/Cookiecutterhttps://datumorphism.leima.is/wiki/tools/cookiecutter/Fri, 27 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/tools/cookiecutter/Cookiecutter is a good tool to setup a scaffold for a data science project. cookiecutter-data-science is a very good template to use.
If some specific (internal) packages are needed for almost every package, fork cookiecutter-data-science and make some changes for future use. For example, one might use [[mkdocs]] Documentation Documenting my data science project using sphinx or mkdocs-material instead of [[sphinx]] Documentation Documenting my data science project using sphinx or mkdocs-material . Swap out sphinx for mkdocs if needed.Statistical Sign Testhttps://datumorphism.leima.is/wiki/statistical-hypothesis-testing/sign-test/Sun, 20 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/statistical-hypothesis-testing/sign-test/We have a small dataset, but it doesn’t satisfy the t-test conditions. Then we would use as little assumptions as possible.
Wine Taste Suppose we have two bottles of wine, one of them is 300 euros while the other is 100 euros.
Now we ask the question:
Does expensive wine taste better?
We find 10 experts and give them some experiments. The result is recorded then processed into the following table.
expert # expensive is better 1 yes 2 no 3 yes 4 yes 5 yes 6 no 7 yes 8 yes Naively, we could simply count the number of yes and find the probability of yes in this sample, i.Documentationhttps://datumorphism.leima.is/wiki/tools/documentation/Sat, 28 Aug 2021 00:03:10 +0200https://datumorphism.leima.is/wiki/tools/documentation/I would vote for two very different documentation tools for a data science project,
sphinx docs, and squidfunk/mkdocs-material. Sphinx docs Sphinx docs is a mature and stable. I love reStructuredText as the syntax as it is very versatile. It supports math, figures with captions, admonitions, cross reference, auto doc from docstrings, cross project cross referencing, pdf generation, etc.
reStructuredText is not the only choice
We can also use markdown by choosing the markdown parser. squidfunk/mkdocs-material squidfunk’s mkdocs-material is a rather light-weight but is also growing to be versatile now. The engine is mkdocs. mkdocs-material is a theme but provides a lot of useful and easy to use elements.Dashboardshttps://datumorphism.leima.is/wiki/data-visualization/dashboards/Fri, 27 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/data-visualization/dashboards/Building interactive dashboards is not easy task. However, with the right tool, we can build a prototype fast.
Theories Dashboard building seems to be a task to build whatever charts the business would like to add.
However, theories are required to build quality dashboards1.
AmNeumarkt/253
I wrote a comment about this: AmNeumarkt/253.
Creating visualizations seems to be a creative task. At least for entry-level visualization tasks, we follow our hearts and build whatever is needed. However, visualizations are made for different purposes. Some visualizations are simply explorations and for us to get some feelings on the data. Some others are built for the validation of hypotheses.Mann-Whitney U Testhttps://datumorphism.leima.is/wiki/statistical-hypothesis-testing/mann-whitney-u-test/Sun, 20 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/statistical-hypothesis-testing/mann-whitney-u-test/Mann-Whitney U is good at [[hypothesis testing]] Statistical Hypothesis Testing hypothesis testing is about the probability of alternative hypothesis if the null hypothesis is true, or even more general heavy-tailed data.Describing Multi-dimensional Datahttps://datumorphism.leima.is/wiki/statistics/multidimensional-data/Mon, 03 Dec 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/statistics/multidimensional-data/Descriptions of Multidimensional Data Dispersion Matrix As defined in Correlation Coefficient and Covariance for Numeric Data, covariance is about the variance of two series. This property makes it easy to generalize it to multidimensional data.
The generalized quantity is named as dispersion matrix. Suppose we have a $p$ dimensional dataset $X$,
index $x_1$ $x_2$ … $x_p$ 1 2.3 12.3 83.2 9.3 … … … … … N 3.1 5.6 23.6 8.2 We could then calculate the pairwise covariance between the different dimensions.
$x_1$ $x_2$ … $x_p$ $x_1$ $x_2$ … $x_p$Data Storagehttps://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/data-storage/Wed, 05 May 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/data-storage/ Many of the example are from the book by Adreas Kretz. Find the link to the book in the references section. Types of Storage and Data Here is a list 1
Files S3 Message Queues Kinesis Relational DB MySQL Postgres Non-relational DB Document Store MongoDB DocumentDB Key-Value Store HBase Redis Kretz2019 The Data Engineering Cookbook  ↩︎Comparison of MLOps Frameworkshttps://datumorphism.leima.is/wiki/mlops/comparison-of-frameworks/Wed, 05 May 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/mlops/comparison-of-frameworks/Metaflow Docs
A framework for jupyter notebook data scientists.
Work locally on notebooks. Python environment management using conda. Work in the cloud with Sagemaker. Tasks Methods Comments Code Scripts/Jupyter Notebook Datastore local + S3 metaflow.S3 Compute local + AWS Batch Metadata metaflow service Metadata specifies flow executions: Flows, Runs, Steps, Tasks, and Artifacts. Scheduling AWS Step Functions Deployment AWS Demo
from metaflow import FlowSpec, step class BranchFlow(FlowSpec): @step def start(self): self.next(self.a, self.b) @step def a(self): self.Data Processinghttps://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/data-processing/Wed, 05 May 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/data-processing/Many of the example are from the book by Adreas Kretz. Find the link to the book in the references section. Batch Process Kretz recommend to start from batch processing and move to streaming if needed 1.
Stream Process Three methods to stream data
At Least Once: message gets processed once or multiple times never dropped e.g., time-based GPS data in fleet management, if the stream data has the same timestamp, then we just override the existing data, we do not care how many times the data is being processed or streamed. At Most Once: okay to drop a message only processed once at max e.Signal Processinghttps://datumorphism.leima.is/wiki/algorithms/singal-processing/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/algorithms/singal-processing/There are many fascinating ideas in signal processing.Data Processing - (Py)Sparkhttps://datumorphism.leima.is/wiki/tools/data-processing-spark/Mon, 31 Jan 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/tools/data-processing-spark/Spark uses Resilient Distributed Dataset (RDD).
Spark Clusters In one cluster, we have a driver is responsible for managing the tasks, result consolidation, and also shared data access1.
PySpark PySpark provides
SparkContext SparkSession A spark dataframe is immutable. This makes it tricky to update a dataframe.
Useful Commands Get Session In pyspark, we can also get or create the session using the following method.
pyspark.sql.SparkSession.builder.getOrCreate() List all Tables A pyspark.sql.SparkSession has property catalog and can be used to list the tables.
pyspark.sql.SparkSession.catalog.listTables() Run SQL Query Given a SQL query query, we can query the table using .Signal Processing: Audio Basicshttps://datumorphism.leima.is/wiki/algorithms/signal-processing-audio/Thu, 29 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/algorithms/signal-processing-audio/Keywords Harmonic structure of sound Parson code of music Linear time-invariant theory Autocorrelation Noise Chirps DCT compression Discrete Fourier transform filtering convolution Linear Time-Invariant System We describe the system with $Y(t) = f(X(t))$, where $X(t)$ is the input, and $Y(t)$ is the output.
Linear: $f(a X_1(t) + b X_2(t)) = a f(X_1(t)) + b f(X_2(t))$ Time-invariant: input $X(t+\Delta t)$ will produce the shifted signal $Y(t+\Delta t)$. LTI systems are memory systems, casual, real, and stable. Stable means the output won’t reach infinite if the input is finite. It’s bounded.
Impulse Response Suppose we have a impulse $X(t) = I(t)$, and output $h(t)$.Basics of MapReducehttps://datumorphism.leima.is/wiki/algorithms/map-reduce/Wed, 03 Oct 2018 00:00:00 +0000https://datumorphism.leima.is/wiki/algorithms/map-reduce/Centralized servers are not efficient for big data. Querying and processing data on centralized servers would reach bottleneck of the servers.
MapReduce is used to solve these problems of big data. The two videos are .
Map: take series of key-value pairs and divide them into groups. Reduce: recombine the key-value pairs Checkout the code challenges of MapReduce on HackerRank.Scale Uphttps://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/scale-up/Wed, 05 May 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/scale-up/Many of the example are from the book by Adreas Kretz. Find the link to the book in the references section. Scaling Up Storage Scaling Up SQL DB SAN: Storage Area Network Use multiple servers on the DB storage to make the query faster.
Good for Read-only DB Not convinient to update DB Hadoop Hadoop:
Distributed storage Analysis 4 core modules:
Hadoop common background functionalities HDFS Divide into blocks Distribute [[MapReduce]] Basics of MapReduce mapreduce Old tech YARN Resource management The Hadoop Ecosystem:Autoregressive Denoising Diffusion Modelhttps://datumorphism.leima.is/wiki/time-series/autoregressive-denoising-diffusion-model/Fri, 10 Feb 2023 00:00:00 +0000https://datumorphism.leima.is/wiki/time-series/autoregressive-denoising-diffusion-model/Autoregressive In an multivariate [[forecasting problem]] The Time Series Forecasting Problem Forecasting time series , given an input sequence $\mathbf x_{t-K: t}$, we forecast $\mathbf x_{t+1:t+H}$.
To apply the [[denoising diffusion model]] Diffusion Models for Forecasting Objective In a denoising diffusion model, given an input $\mathbf x^0$ drawn from a complicated and unknown distribution $q(\mathbf x^0)$, we find a latent space with a simple and manageable distribution, e.g., normal distribution, and the transformations from $\mathbf x^0$ to $\mathbf x^n$, as well as the transformations from $\mathbf x^n$ to $\mathbf x^0$. An Example For example, with $N=5$, the forward process is flowchart LR x0 -- x1 -- x2 -- x3 -- x4 -- x5 and the reverse process is … in our multivariate forecasting problem, we define our forecasting task as an autoregressive problemDiffusion Models for Forecastinghttps://datumorphism.leima.is/wiki/machine-learning/energy-based-model/diffusion-model/Fri, 10 Feb 2023 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/energy-based-model/diffusion-model/Objective In a denoising diffusion model, given
an input $\mathbf x^0$ drawn from a complicated and unknown distribution $q(\mathbf x^0)$, we find
a latent space with a simple and manageable distribution, e.g., normal distribution, and the transformations from $\mathbf x^0$ to $\mathbf x^n$, as well as the transformations from $\mathbf x^n$ to $\mathbf x^0$. An Example For example, with $N=5$, the forward process is
flowchart LR x0 -- x1 -- x2 -- x3 -- x4 -- x5 and the reverse process is
flowchart LR x5 -- x4 -- x3 -- x2 -- x1 -- x0 The joint distribution we are searching for isCopulahttps://datumorphism.leima.is/cards/statistics/copula/Sun, 01 Jan 2023 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/copula/Given two uniform marginals, we can apply the inverse cdf of a continuous distribution to form a new joint distribution.
Some examples in this notebook.
Uniform marginals [[Gaussian]] Multivariate Normal Distribution Multivariate Gaussian distribution copula:
Normal, Normal Some other examples:
[[Normal]] Normal Distribution Gaussian distribution and [[Beta]] Beta Distribution Beta Distribution Interact Alpha Beta mode ((beta_mode)) median ((beta_median)) mean ((beta_mean)) ((makeGraph)) : Normal, Beta Gumbel and [[Beta]] Beta Distribution Beta Distribution Interact Alpha Beta mode ((beta_mode)) median ((beta_median)) mean ((beta_mean)) ((makeGraph)) : Gumbel, Beta [[t distribution]] t Distribution t distribution : t, tMultivariate Normal Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/multivariate-normal-distribution/Fri, 23 Dec 2022 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/multivariate-normal-distribution/2 Variables 2-Variable [[Gaussian]] Normal Distribution Gaussian distributionMTGNNhttps://datumorphism.leima.is/reading/mtgnn/Mon, 28 Nov 2022 00:00:00 +0000https://datumorphism.leima.is/reading/mtgnn/Key Components Time Convolution (TC) Module Time Convolution The temporal convolution is responsible for capturing temporal patterns in a sequence. Graph Convolution Module Mix-hop Propagation in GNN Mix-hop is a strategy to avoid oversmoothing in GNN Graph Structure Learning Layer Graph Structure Learning in GNN We can learn a graph structure without prior knowledge Architecture Wu et al., 2020Level Sethttps://datumorphism.leima.is/cards/math/level-set/Sat, 12 Nov 2022 00:00:00 +0000https://datumorphism.leima.is/cards/math/level-set/Given a real function $f(x_1, \cdots, x_n)$, its level set for $f(x_1, \cdots, x_n) =c$ are the corresponding arguments $x_1, \cdots, x_n$1,
$$ L_c(f) = \{ (x_1, \cdots, x_n) \vert f(x_1, \cdots, x_n) =c \} $$
wiki-ls Contributors to Wikimedia projects. Level set. In: Wikipedia [Internet]. 4 Nov 2022 [cited 12 Nov 2022]. Available: https://en.wikipedia.org/wiki/Level_set  ↩︎Level Set Forecasterhttps://datumorphism.leima.is/wiki/forecasting/probablistic/level-set-forecaster/Sat, 12 Nov 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/forecasting/probablistic/level-set-forecaster/This note is a more detailed version of Algorithm 1 in:
Hasson H, Wang B, Januschowski T, Gasthaus J. Probabilistic forecasting: A level-set approach. Adv Neural Inf Process Syst. 2021;34: 6404–6416. Available: https://proceedings.neurips.cc/paper/2021/hash/32b127307a606effdcc8e51f60a45922-Abstract.html.
It maybe hard to comprehend without reading the texts before Algorithm 1.
A level set forecaster converts point forecaster to probabilistic forecasters by constructing the [[level set]] Level Set Level set can be used in ML of the forecaster1.
Given a point forecaster $f(x_1, \cdots, x_d)$ trained on dataset $\mathcal D = {(\mathcal x_i, y_i)}$, we collect the predictions and true values and build a map, $f(x_i) \to [y_{i_1}, y_{i_2}, \cdots, y_{i_m}]$.MTGNNhttps://datumorphism.leima.is/wiki/forecasting/interactions/mtgnn/Thu, 20 Oct 2022 21:15:28 +0200https://datumorphism.leima.is/wiki/forecasting/interactions/mtgnn/Pytorch Data Parallelismhttps://datumorphism.leima.is/cards/machine-learning/practice/pytorch-data-parallelism/Wed, 19 Oct 2022 14:54:29 +0200https://datumorphism.leima.is/cards/machine-learning/practice/pytorch-data-parallelism/To train large models using PyTorch, we need to go parallel. There are two commonly used strategies123:
model parallelism, data parallelism, data-model parallelism. Model Parallelism Model parallelism splits the model on different nodes14. We will focus on data parallelism but the key idea is shown in the following illustration.
Model parallelLi X, Zhang G, Li K, Zheng W. Chapter 4 - Deep Learning and Its Parallelization. In: Buyya R, Calheiros RN, Dastjerdi AV, editors. Big Data. Morgan Kaufmann; 2016. pp. 95–118. doi:10.1016/B978-0-12-805394-2.00004-0
Data Parallelism Data parallelism creates replicas of the model on each device and use different subsets of training data14.CUDA Memoryhttps://datumorphism.leima.is/cards/machine-learning/practice/cuda-memory/Wed, 19 Oct 2022 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/practice/cuda-memory/CUDA is widely used in deep learning. Though many of deep learning professionals are not exposed to CUDA directly, most people are already using CUDA as frameworks like PyTorch are providing GPU support through CUDA.
To optimize the computational efficiency of our models, knowledge about the data transfer inside the devices is crucial. In this note, we build up the fundamentals of memory transfer for CUDA.
Segmented Memory and Paged Memory CUDA Can not Use Paged Memory A CPU host uses paged memory. However, GPU can not directly take data from paged memory on the host1. Before accessing the data, CUDA has to pin the memory so that the memory is page-locked2.Empirical Correlation Coefficient (CORR)https://datumorphism.leima.is/cards/time-series/ts-corr/Sun, 21 Aug 2022 20:43:36 +0200https://datumorphism.leima.is/cards/time-series/ts-corr/The Empirical Correlation Coefficient (CORR) is an evaluation metric in time series forecasting,1
$$ \mathrm{CORR} = \frac{1}{N} \sum_{i=1}^N \frac{ \sum_t (y^{(i)}_t - \bar y^{(i)} ) ( \hat y^{(i)}_t -\bar{ \hat y}^{(i)} ) }{ \sqrt{ \sum_t (y^{(i)}_t - \bar y^{(i)} )^2 ( \hat y^{(i)}_t -\bar{\hat y}^{(i)} )^2 } } $$
where $y^{(i)}$ is the $i$th time series, ${} _ t$ denotes the time step $t$, and $\bar y^{(i)}$ is the mean of the $i$th forecasted series, i.e., $\bar y^{(i)} = \operatorname{mean}( y^{(i)} _ { t \in \{T _ f, T _ {f+1}, \cdots T _ {f+H}\} } )$.
Lai2017 Lai G, Chang W-C, Yang Y, Liu H.Root Relative Squared Error (RSE)https://datumorphism.leima.is/cards/time-series/ts-rse/Sun, 21 Aug 2022 20:43:36 +0200https://datumorphism.leima.is/cards/time-series/ts-rse/The Root Relative Squared Error (RSE) is an evaluation metric in time series forecasting,1
$$ \mathrm{RSE} = \frac{ \sqrt{ \sum_{i, t} ( y^{(i)}_t - \hat y^{(i)}_t )^2 } }{ \sqrt{ \sum_{i, t} ( y^{(i)}_t - \bar y )^2 } } $$
where $y^{(i)}$ is the $i$th time series, ${} _ t$ denotes the time step $t$, and $\bar y$ is the mean of the forecasted series, i.e., $\bar y = \operatorname{mean}(y^{(i\in\{0, 1, \cdots, N\})} _ { t\in \{T _ f, T _ {f+1}, \cdots T _ {f+H}\} })$.
Lai2017 Lai G, Chang W-C, Yang Y, Liu H.Dilated Convolutionhttps://datumorphism.leima.is/cards/math/convolution-dilated/Sun, 21 Aug 2022 16:03:14 +0200https://datumorphism.leima.is/cards/math/convolution-dilated/For a convolution
$$ f*h(x) = \sum_{s+t=x} f(s) h(t), $$
the dilated version of it is1
$$ f*_l h(x) = \sum_{s+t*l=x} f(s) h(t), $$
where $l$ is the dilation factor.
Yu2015 Yu F, Koltun V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv [cs.CV]. 2015. Available: http://arxiv.org/abs/1511.07122  ↩︎Over-Smoothing in Graph Neural Networkshttps://datumorphism.leima.is/cards/graph/graph-neural-networks-over-smoothing/Sun, 21 Aug 2022 13:45:33 +0200https://datumorphism.leima.is/cards/graph/graph-neural-networks-over-smoothing/Over-smoothing is the problem that the representations on each node of the graph neural networks becomes way too similar to each other.1 In Chapter 7 of Hamilton2020, the author interprets this phenomenon using the lower pass filter theory in signal processing, i.e., multiplying a signal by $\mathbf A^n$ is similar to a low-pass filter when $n$ is large, with $\mathbf A$ being the adjacency matrix.
Hamilton2020 Hamilton WL. Graph Representation Learning. Morgan & Claypool Publishers; 2020. pp. 1–159. doi:10.2200/S01045ED1V01Y202009AIM046  ↩︎Data Generating Processes for Time Series Datahttps://datumorphism.leima.is/cards/time-series/generating-process/Mon, 04 Jul 2022 00:00:00 +0000https://datumorphism.leima.is/cards/time-series/generating-process/PySpark: Beware of Python Mutable Objectshttps://datumorphism.leima.is/til/data/pyspark.beware-of-mutable-objects/Wed, 20 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/til/data/pyspark.beware-of-mutable-objects/.file { display: block } I created this example notebook to demonstrate the potential danger when dealing with mutable objects in pyspark udfs.
https://gist.github.com/emptymalei/07ba6716d0e2d815ebb64adce25dee72
In the above notebook, we can see that python lists in udfs are behaving like just pointers. For group in the aggregation, we see that the lists in the same values in column b are behaving like the same list, thus pointer like.
To solve this problem, we can do a few things.
Cache the dataframe after aggregation.
sdf_2 = sdf.groupby("language", "b").agg(F.max("b").alias("combined")).cache() Make a copy of the mutable object.Conformal Time Series Forecastinghttps://datumorphism.leima.is/wiki/forecasting/probablistic/conformal-time-series-forecasting/Tue, 19 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/wiki/forecasting/probablistic/conformal-time-series-forecasting/Conformal time series forecasting is a probabilistic forecasting method using [[Conformal Prediction]] Conformal Prediction Conformal prediction is a method to sequentially predict consistent confidence intervals using nonconformity measures. .
For any given model $\mathcal M$, conformal time series forecasting trains on a training dataset $\mathcal D_{\text{Train}}$ then calculates a [[Confidence Interval]] Confidence Interval Estimates from a sample can be entitled a confidence interval using a calibration dataset $\mathcal D_{\text{Calibration}}$. The confidence interval is directly used for inference. This framework is called the inductive conformal prediction (ICP).
Induction, Deduction, and Transduction How to Forecast the Confidence Interval For a dataset $\mathcal D$, we split it, e.Induction, Deduction, and Transductionhttps://datumorphism.leima.is/cards/machine-learning/learning-theories/induction-deduction-transduction/Tue, 19 Apr 2022 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/learning-theories/induction-deduction-transduction/Graph Edge Samplinghttps://datumorphism.leima.is/cards/graph/graph-edge-sampling/Sun, 20 Mar 2022 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-edge-sampling/Edge sampling is a technique to deal with weighted edges in large [[graph]] What is Graph Graph A graph $\mathcal G$ has nodes $\mathcal V$ and edges $\mathcal E$, $$ \mathcal G = ( \mathcal V, \mathcal E). $$ Edges Edges are relations between nodes. For $u\in \mathcal V$ and $v\in \mathcal V$, if there is an edge between them, then $(u, v)\in \mathcal E$. Representations of Graph There are different representations of a graph. Adjacency Matrix A adjacency matrix of a graph represents the nodes using row and column indices and edges using elements of the matrix. For simple … .Continuous Ranked Probability Score - CRPShttps://datumorphism.leima.is/cards/time-series/crps/Fri, 18 Mar 2022 00:00:00 +0000https://datumorphism.leima.is/cards/time-series/crps/The Continuous Ranked Probability Score, known as CRPS, is a score to measure how a proposed distribution approximates the data, without knowledge about the true distributions of the data.
Definition CRPS is defined as1
$$ CRPS(P, x_a) = \int_{-\infty}^\infty \lVert P(x) - H(x - x_a) \rVert_2 dx, $$
where
$x_a$ is the true value of $x$, P(x) is our proposed cumulative distribution for $x$, $H(x)$ is the Heaviside step function $$ H(x) = \begin{cases} 1, &\qquad x=0\\ 0, &\qquad x\leq 0\\ \end{cases} $$
$\lVert \cdot \rVert_2$ is the L2 norm. Explain it The formula looks abstract on first sight, but it becomes crystal clear once we understand it.PySpark: Compare Two Schemashttps://datumorphism.leima.is/til/data/pyspark-schema-comparison/Mon, 14 Feb 2022 00:00:00 +0000https://datumorphism.leima.is/til/data/pyspark-schema-comparison/To compare two dataframe schemas in [[PySpark]] Data Processing - (Py)Spark Processing Data using (Py)Spark , we can utilize the set operations in python.
def schema_diff(schema1, schema2): return { 'fields_in_1_not_2': set(schema1) - set(schema2), 'fields_in_2_not_1': set(schema2) - set(schema1) }StemGNNhttps://datumorphism.leima.is/reading/stemgnn/Sat, 01 Jan 2022 00:00:00 +0000https://datumorphism.leima.is/reading/stemgnn/What problem is StemGNN solving: intra-series temporal pattern: DFT Each series inter-series correlations At each step, the interactions between nodes reversible operator Example problem Covid cases: DE, AT, NL, … Predicting each country without considering the interactions between them Or introduce the people flow between them GFT: Completes DFT as it takes care of the inter-series correlations one extra slide for this topic Convolutions on Graphs one extra slide for this topic Graph Basics How to build the graph “self-attention”: outer product of key, query, as the adjacency matrix key, query are of length # of ndoes Weights and Biases LC 1DConv GLU FC Experiments Traffic adjacency matrix neighbouring sensors have higher correlations Covid Neighbouring countries have higher correlation Spetral analysis: Some eigenvectors have clear meanings Week spots Paper and code are not consistent https://github.Convolutions Using Fourier Transformhttps://datumorphism.leima.is/cards/math/convolution-and-fourier-transform/Sat, 04 Dec 2021 00:00:00 +0000https://datumorphism.leima.is/cards/math/convolution-and-fourier-transform/Convolution
$$ (f*h)(x) = \int \mathrm d f(y) h(x-y), $$
is equivalent to
$$ \mathcal F^{-1}\left[ \mathcal F[ f(x) ] \circ \mathcal F[h(x)] \right], $$
with $\mathcal F$ being the Fourier transform, i.e.,
$$ \mathcal F[f(x)] = \int \mathrm d x f(x) e^{-2\pi i x s}. $$
Proof
One could prove it using the Fourier integral theorem,
$$ f(x) = \iint dy d\xi f(y)e^{2\pi i (x-y)\xi}. $$
Derivation Given
$$ \begin{align} \mathcal F_s \left(f(y)\right) &= \int dy f(y) e^{-2\pi i y s}, \\ \mathcal F_s \left(h(y)\right) &= \int dz h(z) e^{-2\pi i z s}, \end{align} $$
we haveGraph Convolution Operatorhttps://datumorphism.leima.is/cards/graph/graph-convolution-operator/Thu, 25 Nov 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-convolution-operator/For a given graph $\mathcal G$, we have an attribute on each node, denoted as $f_v$. All the node attributes put together can be written as a list $\mathbf f\to (f_{v_1}, f_{v_2}, \cdots, f_{v_N})$.
Convolution on graph is combining attributes on nodes with their neighbors'. The [[adjacency matrix]] Graph Adjacency Matrix A graph $\mathcal G$ can be represented with an adjacency matrix $\mathbf A$. There are some nice and clear examples on wikipedia1, for example, $$ \begin{pmatrix} 2 & 1 & 0 & 0 & 1 & 0\\ 1 & 0 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 1\\ 1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix} $$ for the graph Public Domain, Link $\mathbf A$ applied on all node attributes $\mathbf f$ is such an operation, i.Centered Kernel Alignment (CKA)https://datumorphism.leima.is/cards/machine-learning/measurement/centered-kernel-alignment/Mon, 08 Nov 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/measurement/centered-kernel-alignment/Centered Kernel Alignment (CKA) is a similarity metric designed to measure the similarity of between representations of features in neural networks1.
Definition of CKA CKA is based on the [[Hilbert-Schmidt Independence Criterion (HSIC)]] Hilbert-Schmidt Independence Criterion (HSIC) Given two kernels of the feature representations $K=k(x,x)$ and $L=l(y,y)$, HSIC is defined as12 $$ \operatorname{HSIC}(K, L) = \frac{1}{(n-1)^2} \operatorname{tr}( K H L H ), $$ where $x$, $y$ are the representations of features, $n$ is the dimension of the representation of the features, $H$ is the so-called [[centering matrix]] Centering Matrix Useful when centering a vector around its mean . We can choose different kernel functions $k$ and $l$.Centering Matrixhttps://datumorphism.leima.is/cards/math/statistics-centering-matrix/Mon, 08 Nov 2021 00:00:00 +0000https://datumorphism.leima.is/cards/math/statistics-centering-matrix/Given a vector $v$, with mean value of its elements $m$, we can center the vector by subtracting the mean $m$ from each element,
import numpy as np n = 10 v = np.random.randn(n) v_c = v - v.mean() This operation is easy and obvious. However, the formalism is not elegant. In some cases, we would like to formulate the process of centering the elements as operators,
$$ v_c = \operatorname{\hat H}v. $$
In this case, the operator $\operatorname{\hat H}$ is simply a matrix
$$ \operatorname{\hat H} \to I_n - \frac{1}{n} J_n, $$
where $n$ is the dimension of the vector $v$, $I_n$ is a identity matrix, $J_n$ is a matrix of all $1$s.Hilbert-Schmidt Independence Criterion (HSIC)https://datumorphism.leima.is/cards/machine-learning/measurement/hilbert-schmidt-independence-criterion/Mon, 08 Nov 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/measurement/hilbert-schmidt-independence-criterion/Given two kernels of the feature representations $K=k(x,x)$ and $L=l(y,y)$, HSIC is defined as12
$$ \operatorname{HSIC}(K, L) = \frac{1}{(n-1)^2} \operatorname{tr}( K H L H ), $$
where
$x$, $y$ are the representations of features, $n$ is the dimension of the representation of the features, $H$ is the so-called [[centering matrix]] Centering Matrix Useful when centering a vector around its mean . We can choose different kernel functions $k$ and $l$. For example, if $k$ and $l$ are linear kernels, we have $k(x, y) = l(x, y) = x \cdot y$. In this linear case, HSIC is simply $\parallel\operatorname{cov}(x^T,y^T) \parallel^2_{\text{Frobenius}}$.Differential Learning Rates in PyTorchhttps://datumorphism.leima.is/til/machine-learning/pytorch/pytorch-differential-learning-rates/Mon, 01 Nov 2021 00:00:00 +0000https://datumorphism.leima.is/til/machine-learning/pytorch/pytorch-differential-learning-rates/Using different learning rates in different layers of our artificial neural network.Learning Ratehttps://datumorphism.leima.is/cards/machine-learning/practice/learning-rate/Mon, 01 Nov 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/practice/learning-rate/Finding a suitable learning rate for our model training is crucial.
A safe but time wasting option is to use search on a grid of parameters. However, there are smarter moves.
Karpathy’s Constant
An empirical learning rate $3^{-4}$ for Adms, aka, Karpathy's constant, was started as a tweet by Andrei Karpathy. 3e-4 is the best learning rate for Adam, hands down.
— Andrej Karpathy (@karpathy) November 24, 2016 (i just wanted to make sure that people understand that this is a joke...)
— Andrej Karpathy (@karpathy) November 24, 2016 Smarter Method A smarter method is to start with small learning rate and increase it on each mini-batch, then observe the loss vs learning rate (mini-batch in this case).Shatterhttps://datumorphism.leima.is/cards/machine-learning/learning-theories/set-shatter/Wed, 27 Oct 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/learning-theories/set-shatter/Given a set $\mathcal S$, and a class (collection of sets) $\mathcal H$.
For any subset of $\mathcal S$, denoted as $\mathcal s$, if we have an element of class $\mathcal H$, denoted as $\mathcal h$, that leads to1
$$ \mathcal h \cap \mathcal S = \mathcal s. $$
Since the power set of $\mathcal S$ ($P(\mathcal S)$) contains all the possible subsets of $\mathcal S$, we can also rephrase the concept using power set. If we can find the power set $P(\mathcal S)$ by looking into intersections of elements $\mathcal h$ of $\mathcal H$ ($\mathcal h\in \mathcal H$), then we say $\mathcal H$ shatters $\mathcal S$ 1.Graph Global Overlap Measure: Katz Indexhttps://datumorphism.leima.is/cards/graph/graph-global-overlap-katz-index/Sun, 26 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-global-overlap-katz-index/The Katz index is
$$ \mathbf S_{\text{Katz}}[u,v] = \sum_{i=1}^\infty \beta^i \mathbf A^i[u, v], $$
where $\mathbf A^i[u, v]$ is the matrix $\mathbf A$ to the $i$th power. Some for $\beta^i$. The Katz index describes the similarity between of node $u$ and node $v$.
Do not confuse power with contravariant indices
For readers familiar with tensor notations, it might be confusing. We some times use contravariant indices on the top right of the tensor notation.
But here ${}^{i}$ means to the $i$th power.
The index is proved to be the following
$$ \mathbf S_{\text{Katz}} = (\mathbf I - \beta \mathbf A)^{-1} - \mathbf I.Graph Global Overlap Measure: Leicht-Holme-Newman Indexhttps://datumorphism.leima.is/cards/graph/graph-global-overlap-leicht-holme-newman-similarity/Sun, 26 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-global-overlap-leicht-holme-newman-similarity/The LHN index is a normalized similarity index.
From Katz Index to LHN Index [[Katz Index]] Graph Global Overlap Measure: Katz Index The Katz index is $$ \mathbf S_{\text{Katz}}[u,v] = \sum_{i=1}^\infty \beta^i \mathbf A^i[u, v], $$ where $\mathbf A^i[u, v]$ is the matrix $\mathbf A$ to the $i$th power. Some for $\beta^i$. The Katz index describes the similarity between of node $u$ and node $v$. Do not confuse power with contravariant indices For readers familiar with tensor notations, it might be confusing. We some times use contravariant indices on the top right of the tensor notation. But here ${}^{i}$ means to the $i$th … has a knob to tune the punishment towards longer paths.Graph Global Overlap Measure: Random Walk Similarityhttps://datumorphism.leima.is/cards/graph/graph-global-overlap-random-walk-similarity/Sun, 26 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-global-overlap-random-walk-similarity/Random Walk Construct a stochastic transfer matrix $P$ by normalizing the adjacency matrix $\mathbf A$ using the node degrees of the target nodes,
$$ \mathbf P = \mathbf A \mathbf D^{-1}, $$
where $\mathbf A$ is the [[adjacency matrix]] Graph Adjacency Matrix A graph $\mathcal G$ can be represented with an adjacency matrix $\mathbf A$. There are some nice and clear examples on wikipedia1, for example, $$ \begin{pmatrix} 2 & 1 & 0 & 0 & 1 & 0\\ 1 & 0 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 1\\ 1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix} $$ for the graph Public Domain, Link and $\mathbf D$ is a diagonalized matrix with the diagonal elements being the degrees.Graph Isomorphismhttps://datumorphism.leima.is/cards/graph/graph-isomorphism/Sun, 26 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-isomorphism/For two graphs, $\mathcal G$ and $\mathcal H$, the two graphs are isomorphism on the following condition
$$ u, v \text{ adjacent in } G \iff u, v \text{ adjacent in } H. $$
An algorithm to find approximate isomorphism is the [[Weisfeiler Lehman Method]] Weisfeiler-Lehman Kernel The Weisfeiler-Lehman kernel is an iterative integration of neighborhood information. We initialize the labels for each node using its own node degree. At each step, we take the neighboring node degrees to form a [[multiset]] Multiset, mset or bag A bag is a set in which duplicate elements are allowed. An ordered bag is a list that we use in programming.Graph Local Overlap Measure: Adamic Adar Indexhttps://datumorphism.leima.is/cards/graph/graph-local-overlap-adamic-adar-index/Sun, 26 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-local-overlap-adamic-adar-index/The Adamic Adar (AA) index is1
$$ \mathbf S_{\text{AA}}[v_1,v_2] = \sum_{u\in\mathcal N(u) \cap \mathcal N(v)} \frac{1}{\log d_u}, $$
where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$.
If two nodes have shared neighbor, the degree of the neighbors will be at least 2. So it is safe to use $1/\log d_u$.
Adamic2003 Adamic LA, Adar E. Friends and neighbors on the Web. Soc Networks. 2003;25: 211–230. doi:10.1016/S0378-8733(03)00009-1  ↩︎Graph Local Overlap Measure: Resource Allocation Indexhttps://datumorphism.leima.is/cards/graph/graph-local-overlap-resource-allocation-index/Sun, 26 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-local-overlap-resource-allocation-index/The Resource Allocation (RA) index is
$$ \mathbf S_{\text{RA}}[v_1,v_2] = \sum_{u\in\mathcal N(u) \cap \mathcal N(v)} \frac{1}{d_u}, $$
where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$.Graph Local Overlap Measure: Salton Indexhttps://datumorphism.leima.is/cards/graph/graph-local-overlap-salton-index/Sun, 26 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-local-overlap-salton-index/The Salton index is
$$ \mathbf S_{\text{Salton}}[u,v] = \frac{ 2\lvert \mathcal N (u) \cap \mathcal N(v) \rvert }{ \sqrt{d_u d_v}}, $$
where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$.Graph Local Overlap Measure: Sorensen Indexhttps://datumorphism.leima.is/cards/graph/graph-local-overlap-sorensen-index/Sun, 26 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-local-overlap-sorensen-index/The Sorensen index is
$$ \mathbf S_{\text{Sorensen}}[u,v] = \frac{ 2\lvert \mathcal N (u) \cap \mathcal N(v) \rvert }{ d_u + d_v}, $$
where $d_u$ is the node degree of node $u$ and $\mathcal N(u)$ is the neighbor nodes of $u$.Betweenness Centrality of a Graphhttps://datumorphism.leima.is/cards/graph/graph-betweenness-centrality/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-betweenness-centrality/Betweenness centrality of a node $v$ is measurement of how likely the shortest path between two nodes $u_s$ and $u_t$ is gonna pass through node $v$,
$$ c(v) = \sum_{v\neq u_s\neq u_t} \frac{\sigma_{u_su_t}(v) }{\sigma_{u_su_t}}, $$
where $\sigma_{u_su_t}(v)$ is the number of shortest path between $u_s$ and $u_t$, and passing through $u$, while $\sigma_{u_su_t}$ is the number of shortest path between $u_s$ and $u_t$.
A figure from wikipedia demonstrates this idea well. The nodes on the outreach have smaller betweenness centrality, while the nodes in the core have higher betweenness centrality.
Source: Wikipedia
Outreach and Core
It is almost like cheating using the work “outreach” and “core” here.Eigenvector Centrality of a Graphhttps://datumorphism.leima.is/cards/graph/graph-eigenvector-centrality/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-eigenvector-centrality/Given a graph with adjacency matrix $\mathbf A$, the eigenvector centrality is
$$ \mathbf e_u = \frac{1}{\lambda} \sum_{v\in\mathcal V} \mathbf A[u,v] \mathbf e_v, \qquad \forall u \in \mathcal V. $$
Why is it called Eigenvector Centrality
The definition is equivalent to
$$ \lambda \mathbf e = \mathbf A\mathbf e. $$
Power Iteration The solution to $\mathbf e$ is the eigenvector that corresponds to the largest eigenvalue $\lambda_1$. Power iteration method can help us get this eigenvector, i.e., the $^{(t+1)}$ iteration is related to the previous iteration $^{(t)}$, through the following relation,
$$ \mathbf e^{(t+1)} = \mathbf A \mathbf e^{(t)}.Graph Adjacency Matrixhttps://datumorphism.leima.is/cards/graph/graph-adjacency-matrix/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-adjacency-matrix/A graph $\mathcal G$ can be represented with an adjacency matrix $\mathbf A$. There are some nice and clear examples on wikipedia1, for example,
$$ \begin{pmatrix} 2 & 1 & 0 & 0 & 1 & 0\\ 1 & 0 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 1\\ 1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix} $$
for the graph
Public Domain, LinkGraph Clustering Coefficienthttps://datumorphism.leima.is/cards/graph/graph-local-clustering-coefficient/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-local-clustering-coefficient/Local Clustering Coefficient $$ c_u = \frac{ \lvert (v_1,v_2)\in \mathcal E: v_1, v_2 \in \mathcal N(u) \rvert}{ \color{red}{d_n \choose 2} }, $$
where $\color{red}{d_n \choose 2}$ means all the possible combinations of neighbor nodes, and $\mathcal N(u)$ is the set of nodes that are neighbor to $u$.
Closed Triangles Ego Graph
Counting the closed triangles of the ego graph of a node and normalize it by the total possible number of triangles is also a measure of clustering coefficients.
If the ego graph of $u$ is fully connected, we have $c_u=1$; If the ego graph of $u$ is a star, we have $c_u=0$.Graph Cutshttps://datumorphism.leima.is/cards/graph/graph-cuts/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-cuts/Cut For a subset of nodes $\mathcal A\subset \mathcal V$, the rest of nodes can be denoted as $\bar {\mathcal A} = \mathcal V \setminus \mathcal A$. In other words, $\mathcal A \cup \bar {\mathcal A} = \mathcal V$ and $\mathcal A \cap \bar {\mathcal A} = \emptyset$. That being said, the nodes can be partitioned into two subsets, $\mathcal A$ and $\bar {\mathcal A}$. The cut of this partition is defined as the total number of edges between them,
$$ \operatorname{Cut} \left( \mathcal A, \bar{\mathcal A} \right) = \frac{1}{2} \left( \lvert (u, v)\in \mathcal E: u\in \mathcal A, v\in \bar{\mathcal A} \rvert + \lvert (u, v)\in \mathcal E: u\in \bar{\mathcal A}, v\in {\mathcal A} \rvert \right).Graph Laplacianshttps://datumorphism.leima.is/cards/graph/graph-laplacians/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-laplacians/Laplacian is a useful representation of graphs. The unnormalized Laplacian is
$$ \mathbf L = \mathbf D - \mathbf A, $$
where $\mathbf A$ is the [[adjacency matrix]] Graph Adjacency Matrix A graph $\mathcal G$ can be represented with an adjacency matrix $\mathbf A$. There are some nice and clear examples on wikipedia1, for example, $$ \begin{pmatrix} 2 & 1 & 0 & 0 & 1 & 0\\ 1 & 0 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0 & 1 & 1\\ 1 & 1 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 \end{pmatrix} $$ for the graph Public Domain, Link and $\mathbf D$ is the degree matrix, i.Heterophily on Graphhttps://datumorphism.leima.is/cards/graph/heterophily/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/heterophily/Heterophily is the tendency to differ from others. Heterophily on a graph is the tendency to connect to nodes that are different from itself, e.g., nodes with different attributes have higher probability of edge.Homophily on Graphhttps://datumorphism.leima.is/cards/graph/homophily/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/homophily/ Homophily is the principle that a contact between similar people occurs at ahigher rate than among dissimilar people – McPherson20011
McPherson2001 McPherson M, Smith-Lovin L, Cook JM. Birds of a Feather: Homophily in Social Networks. Annu Rev Sociol. 2001;27: 415–444. doi:10.1146/annurev.soc.27.1.415  ↩︎Node Degreehttps://datumorphism.leima.is/cards/graph/node-degree/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/node-degree/Node degree of a node $u$
$$ d_u = \sum_{v\in \mathcal V} A[u,v], $$
where $A$ is the adjacency matrix.Structural Equivalence on Graphhttps://datumorphism.leima.is/cards/graph/structural-equivalence/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/structural-equivalence/Structural Equivalence means that nodes with similar neighborhood structures will share similar attributes.Weisfeiler-Lehman Kernelhttps://datumorphism.leima.is/cards/graph/graph-weisfeiler-lehman-kernel/Sat, 25 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/graph/graph-weisfeiler-lehman-kernel/The Weisfeiler-Lehman kernel is an iterative integration of neighborhood information.
We initialize the labels for each node using its own node degree. At each step, we take the neighboring node degrees to form a [[multiset]] Multiset, mset or bag A bag is a set in which duplicate elements are allowed. An ordered bag is a list that we use in programming. . At step $K$, we have the multisets for each node. Those multisets at each node can be processed to form an representation of the graph which is in turn used to calculate statistics of the graph.Initialize Artificial Neural Networkshttps://datumorphism.leima.is/cards/machine-learning/neural-networks/neural-networks-initialization/Thu, 23 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/neural-networks-initialization/The weights are better if they1
are zero centered, and have similar variance across layers. Why
If we have very different variances across layers, we will need a different learning rate for each layer for our optimization. Setting the variances to be on the same scale, we can use a global learning rate for the whole network. Variance is related to the input size of the layer Suppose we are using a simple linear activation, $\sigma(x) = \alpha x$. For a series of inputs $x_j$, the outputs $y_i$ are
$$ y_i = \sum_{j} w_{ij} x_j. $$Alignment and Uniformityhttps://datumorphism.leima.is/cards/machine-learning/embedding/alignment-and-uniformity/Sat, 11 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/embedding/alignment-and-uniformity/A good representation should be able to
separate different instances, and cluster similar instances. Wang et al proposed two concepts that matches the above two ideas, alignment and uniformity, on a hypersphere1.
From Wang et al
Wang T, Isola P. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2005.10242 ↩︎Cross Entropyhttps://datumorphism.leima.is/cards/information/cross-entropy/Sat, 04 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/information/cross-entropy/Cross entropy is1
$$ H(p, q) = \mathbb E_{p} \left[ -\log q \right]. $$
Cross entropy $H(p, q)$ can also be decomposed,
$$ H(p, q) = H(p) + \operatorname{D}_{\mathrm{KL}} \left( p \parallel q \right), $$
where $H(p)$ is the [[entropy of $P$]] Shannon Entropy Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1, \begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation} shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory)  ↩︎ and $\operatorname{D}_{\mathrm{KL}}$ is the [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .f-Divergencehttps://datumorphism.leima.is/cards/information/f-divergence/Sat, 04 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/information/f-divergence/The f-divergence is defined as1
$$ \operatorname{D}_f = \int f\left(\frac{p}{q}\right) q\mathrm d\mu, $$
where $p$ and $q$ are two densities and $\mu$ is a reference distribution.
Requirements on the generating function
The generating function $f$ is required to
be convex, and $f(1) =0$. For $f(x) = x \log x$ with $x=p/q$, f-divergence is reduced to the KL divergence
$$ \begin{align} &\int f\left(\frac{p}{q}\right) q\mathrm d\mu \\ =& \int \frac{p}{q} \log \left( \frac{p}{q} \right) \mathrm d\mu \\ =& \int p \log \left( \frac{p}{q} \right) \mathrm d\mu. \end{align} $$
For more special cases of f-divergence, please refer to wikipedia1.Jensen-Shannon Divergencehttps://datumorphism.leima.is/cards/information/jensen-shannon-divergence/Sat, 04 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/information/jensen-shannon-divergence/The Jensen-Shannon divergence is a symmetric divergence of distributions $P$ and $Q$,
$$ \operatorname{D}_{\text{JS}} = \frac{1}{2} \left[ \operatorname{D}_{\text{KL}} \left(P \bigg\Vert \frac{P+Q}{2} \right) + \operatorname{D}_{\text{KL}} \left(Q \bigg\Vert \frac{P+Q}{2}\right) \right], $$
where $\operatorname{D}_{\text{KL}}$ is the [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .Shannon Entropyhttps://datumorphism.leima.is/cards/information/shannon-entropy/Sat, 04 Sep 2021 00:00:00 +0000https://datumorphism.leima.is/cards/information/shannon-entropy/Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1,
\begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation}
shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory)  ↩︎VSCode Setup Tests when Module is in a Different Folderhttps://datumorphism.leima.is/til/misc/vscode/vscode-setup-python-tests-with-module-in-src-folder/Tue, 31 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/til/misc/vscode/vscode-setup-python-tests-with-module-in-src-folder/Use .env fileVSCode Terminal Python Can Not Activate Conda on Machttps://datumorphism.leima.is/til/misc/vscode/vscode-terminal-python-not-conda-mac/Tue, 31 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/til/misc/vscode/vscode-terminal-python-not-conda-mac/Enable your key repeat in vscode on macPostgres Timezone Conversionshttps://datumorphism.leima.is/til/data/postgres.timezone-conversion/Fri, 27 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/til/data/postgres.timezone-conversion/Pitfals of timezone conversion in PostgresMutual Informationhttps://datumorphism.leima.is/cards/information/mutual-information/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/cards/information/mutual-information/Mutual information is defined as
$$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}. $$
In the case that $X$ and $Y$ are independent variables, we have $P_{XY} = P_X P_Y$, thus $I(X;Y) = 0$. This makes sense as there would be no “mutual” information if the two variables are independent of each other.
Entropy and Cross Entropy Mutual information is closely related to entropy. A simple decomposition shows that
$$ I(X;Y) = H(X) - H(X\mid Y), $$
which is the reduction of uncertainty in $X$ after observing $Y$.
KL Divergence This definition of mutual information is equivalent to the following [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions ,Noise Contrastive Estimation: NCEhttps://datumorphism.leima.is/cards/machine-learning/learning-theories/noise-contrastive-estimation/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/learning-theories/noise-contrastive-estimation/Noise contrastive estimation (NCE) objective function is1
$$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ C(x, x^{+})}{ C(x,x^{+}) + C(x,x^{-}) } \right], $$
where
$x^{+}$ represents data similar to $x$, $x^{-}$ represents data dissimilar to $x$, $C(\cdot, \cdot)$ is a function to compute the similarities. For example, we can use
$$ C(x, x^{+}) = e^{ f(x)^T f(x^{+}) }, $$
so that the objective function becomes
$$ \mathcal L = \mathbb E_{x, x^{+}, x^{-}} \left[ - \ln \frac{ e^{ f(x)^T f(x^{+}) } }{ e^{ f(x)^T f(x^{+}) } + e^{ f(x)^T f(x^{-}) } } \right]. $$Self-supervised Learning: Generative or Constrastivehttps://datumorphism.leima.is/reading/self-supervised-learning-generative-or-contrastive-2006.08218/Fri, 13 Aug 2021 00:00:00 +0000https://datumorphism.leima.is/reading/self-supervised-learning-generative-or-contrastive-2006.08218/Review of self-supervised learning.The log-sum-exp Trickhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/log-sum-exp-trick/Wed, 28 Jul 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/log-sum-exp-trick/The cross entropy for a binary class is
$$ p \ln \hat p + (1-p) \ln (1-\hat p), $$
where $p$ is the probability of the label A and $\hat p$ is the predicted probability of label A. Since we have binary classes, $p$ is either 1 or 0. However, the predicted probabilities can be any value between $[0,1]$.
Probability
For a very simple case, $\hat p$ might be a sigmoid like expression with exponential in it,
$$ p \sim \frac{1}{1 + \exp(-x)}, $$
where $x$ is some kind of input or intermediate input.
The problem is, exponentials may blow up if $p\to 0$.Managing path using pathlib in Pythonhttps://datumorphism.leima.is/til/programming/python/python-managing-paths-using-pathlib-is-easier/Thu, 15 Jul 2021 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-managing-paths-using-pathlib-is-easier/Since Python 3.4
pathlib is object oriented. It is more elegant than os.path. For example, if we need the parent folders of the currrent file, we need os.path.dirname(),
import os print(f"file: {__file__}") # file: main.py # Using os.path os__file_absolute_path = os.path.abspath(__file__) print(f"Using os.path:: file absolute path: {os__file_absolute_path}") # Using os.path:: file absolute path: /home/runner/pathlib/main.py os__file_in_folder = os.path.dirname(os__file_absolute_path) print(f"Using os.path:: file is in folder: {os__file_in_folder}") # Using os.path:: file is in folder: /home/runner/pathlib It is much more easier to get the folder using pathlib.
from pathlib import Path print(f"file: {__file__}") # file: main.py # Using pathlib path__file = Path(__file__) print(f"Using pathlib:: path__file: {path__file}; using .Box-Cox Transformationhttps://datumorphism.leima.is/cards/statistics/box-cox/Tue, 13 Jul 2021 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/box-cox/Box-Cox transformation is a power transformation that involves logs and powers. It transforms data into normal distributions.
The Box-Cox transformation is defined as
$$ y_i^{(\lambda)} = \begin{cases} \lambda ^{-1} (y_i^\lambda - 1) & \quad \text{if } \lambda \neq 0\\ \log(y_i) & \quad \text{if } \lambda = 0. \end{cases} $$
By selecting a proper $\lambda$, we get a Guassian distributed data, with a variable mean. The transformation take $y$ to
$$ \rho(y^{(\lambda)}) =\frac{ \exp{\left( -(y^{(\lambda)} - \beta X)^{T} (y^{(\lambda)} - \beta X)/(2\sigma^2) \right) }}{(\sqrt{2\pi \sigma^2})^n} \prod_{i=1}^n \left\lvert \frac{d y_i^{(\lambda )}}{ dy_i } \right\rvert. $$
The term
$$ \prod_{i=1}^n \left\lvert \frac{d y_i^{(\lambda )}}{ dy_i } \right\rvert = \lvert J \rvert $$The Hubbard-Stratonovich Identityhttps://datumorphism.leima.is/cards/math/hubbard-stratonovich-identity/Thu, 17 Jun 2021 00:00:00 +0000https://datumorphism.leima.is/cards/math/hubbard-stratonovich-identity/The Hubbard version of the Hubbard-Stratonovich identity is1
$$ \begin{align} \exp{\left( a^2 \right)} =& \frac{1}{\sqrt{\pi}} \int_{-\infty}^\infty \mathrm dx\, \exp{ \left( - x^2 - 2 a x \right)}\\ =& \frac{1}{\sqrt{\pi}} \int_{\infty}^{-\infty} \mathrm dx'\, \exp{ \left( - x'^2 + 2 a x' \right)}, \end{align} $$
where we changed the sign of $x$, i.e., $x' = -x$.
In many partition functions, we have expressions like $\exp{\left( a^2/2\right)}$, using the identity, we have
$$ \begin{align} \exp{\left( \frac{a^2}{2} \right)} =& \frac{1}{\sqrt{\pi}} \int_{\infty}^{-\infty} \mathrm dx\, \exp{ \left( - x^2 + \sqrt{2} a x \right)} \\ =& \frac{1}{\sqrt{2\pi}} \int_{\infty}^{-\infty} \mathrm dx'\, \exp{ \left( - \frac{x'^2}{2} + a x' \right)}, \end{align} $$Likelihoodhttps://datumorphism.leima.is/cards/statistics/likelihood/Wed, 26 May 2021 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/likelihood/For some data points $\{x_i\}$ and a model $\theta$, the likelihood of our data point $x_i$ is $p(x_i\mid \theta)$. To be more specific, the likelihood of all data points is a function of the model $\theta$,
$$ L(\theta) = \Pi_i p(x_i\mid\theta). $$
It should be mentioned that this likelihood is not necessarily a pdf. As an example, we can calculate the likelihood of a Bernoulli distribution for a single event $x$,
$$ L(\theta) = \theta^x (1-\theta)^{(1-x)}. $$
If we are flipping coins, and the head $x=1$ probability is $\theta$, the likelihood for this single event $x=1$ is
$$ L(\theta)=\theta. $$Gaussian Integralshttps://datumorphism.leima.is/cards/math/gaussian-integrals/Tue, 11 May 2021 00:00:00 +0000https://datumorphism.leima.is/cards/math/gaussian-integrals/The diagonalized case
$$ \begin{eqnarray} Z_0 &=& \int d^n z \exp\left(-\frac{1}{2} z^\mathrm{T} D z\right) \\ &=& \prod_i \int d z_i \exp\left(-\frac{1}{2} \lambda_i z_i^2\right) \\ &=& \prod_i \sqrt{\frac{2\pi}{\lambda_i}} \\ &=& \sqrt{\frac{(2\pi)^n}{\det A}}. \end{eqnarray} $$
For an arbitrary matrix $A$,
$$ Z_J = \int d^n x \exp\left(-\frac{1}{2} x^\mathrm{T} A x + J^\mathrm{T} x\right). $$
$$ \begin{eqnarray} Z_J &=& \int d^n y \exp\left(-\frac{1}{2} {y}^\mathrm{T} A y + \frac{1}{2} J^\mathrm{T}A^{-1}J\right) \\ &=& \sqrt{\frac{(2\pi)^n}{\det A}} \exp\left(\frac{1}{2} J^\mathrm{T}A^{-1}J\right). \end{eqnarray} $$Cross Validationhttps://datumorphism.leima.is/cards/machine-learning/learning-theories/cross-validation/Thu, 06 May 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/learning-theories/cross-validation/Cross validation is a method to estimate the [[risk]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … .
To perform cross validation, we split the train dataset $\mathcal D$ into $k$ folds, with each fold denoted as $\mathcal D_k$.The Learning Problemhttps://datumorphism.leima.is/cards/machine-learning/learning-theories/learning-problem/Thu, 06 May 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/learning-theories/learning-problem/The learning problem posed by Vapnik:1
Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is
$$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$
A learning problem is the minimization of this risk.
Vapnik2000 Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1  ↩︎Explained Variationhttps://datumorphism.leima.is/cards/statistics/explained-variation/Wed, 05 May 2021 18:05:47 +0200https://datumorphism.leima.is/cards/statistics/explained-variation/Using [[Fraser information]] Fraser Information The Fraser information is $$ I_F(\theta) = \int g(X) \ln f(X;\theta) , \mathrm d X. $$ When comparing two models, $\theta_0$ and $\theta_1$, the information gain is $$ \propto (F(\theta_1) - F(\theta_0)). $$ The Fraser information is closed related to [[Fisher information]] Fisher Information Fisher information measures the second moment of the model sensitivity with respect to the parameters. , Shannon information, and [[Kullback information]] KL Divergence Kullback–Leibler divergence … , we can define a relative information gain by a model
$$ \rho_C ^2 = 1 - \frac{ \exp( - 2 F(\theta_1) ) }{ \exp( - 2 F(\theta_0) ) }, $$Fraser Informationhttps://datumorphism.leima.is/cards/information/fraser-information/Wed, 05 May 2021 17:49:12 +0200https://datumorphism.leima.is/cards/information/fraser-information/The Fraser information is
$$ I_F(\theta) = \int g(X) \ln f(X;\theta) , \mathrm d X. $$
When comparing two models, $\theta_0$ and $\theta_1$, the information gain is
$$ \propto (F(\theta_1) - F(\theta_0)). $$
The Fraser information is closed related to [[Fisher information]] Fisher Information Fisher information measures the second moment of the model sensitivity with respect to the parameters. , Shannon information, and [[Kullback information]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions 1.
Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061 ↩︎Fisher Informationhttps://datumorphism.leima.is/cards/information/fisher-information/Wed, 05 May 2021 17:49:03 +0200https://datumorphism.leima.is/cards/information/fisher-information/Given a probability density model $f(X; \theta)$ for a observable $X$, the amount of information that $X$ carriers regarding the model is called Fisher information.
Given ${\theta}$, the probability of observing the value $X$, i.e., the likelihood is
$$ f(X\mid\theta). $$
To describe the suitability of a model and the observables, we can use a the likelihood $f(X\mid \theta)$. One particular interesting property is the sensitivity of the likelihood in terms of the parameter $\theta$ change. For example, the case on the left is less compatible as we have a large variance in the parameters. The model is not very sensitive to the parameter change.Evidence Lower Bound: ELBOhttps://datumorphism.leima.is/wiki/machine-learning/bayesian/elbo/Mon, 12 Apr 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/bayesian/elbo/This article reuses a lot of materials from the references. Please see the references for more details on ELBO. Given a probability distribution density $p(X)$ and a latent variable $Z$, we have the marginalization of the joint probability
$$ \int dZ p(X, Z) = p(X). $$
Using Jensen’s Inequality In many models, we are interested in the log probability density $\log p(X)$ which can be decomposed using an auxiliary density of the latent variable $q(Z)$,
$$ \begin{align} \log p(X) =& \log \int dZ p(X, Z) \\ =& \log \int dZ p(X, Z) \frac{q(Z)}{q(Z)} \\ =& \log \int dZ q(Z) \frac{p(X, Z)}{q(Z)} \\ =& \log \mathbb E_q \left[ \frac{p(X, Z)}{q(Z)} \right].Jensen's Inequalityhttps://datumorphism.leima.is/cards/math/jensens-inequality/Mon, 12 Apr 2021 00:00:00 +0000https://datumorphism.leima.is/cards/math/jensens-inequality/Jensen’s inequality shows that
$$ f(\mathbb E(X)) \leq \mathbb E(f(X)) $$
for a concave function $f(\cdot)$.Valid Confidence Sets in Multiclass and Multilabel Predictionhttps://datumorphism.leima.is/wiki/machine-learning/classification/valid-confidence-sets-in-multiclass-multilabel-prediction/Thu, 08 Apr 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/classification/valid-confidence-sets-in-multiclass-multilabel-prediction/Ask for valid confidence:
“Valid”: validate for test data, train data, or the generating process? “Confidence”: $P(Y \notin C(X)) \le \alpha$ To avoid too much attention on data based validation, a framework called conformal inference was proposed by Vovk et al. in 2005,
$n$ observations, desired confidence level $1-\alpha$, construct confidence sets $C(x)$ using conform methods so that the sets capture the underlying the distribution a new pair $(X_{n+1}, Y_{n+1})$ from the same distribution, $P(Y_{n+1}\in C(X_{n+1})) \le 1-\alpha$KL Divergencehttps://datumorphism.leima.is/wiki/machine-learning/basics/kl-divergence/Mon, 05 Apr 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/basics/kl-divergence/Given two distributions $p(x)$ and $q(x)$, the Kullback-Leibler divergence is defined as
$$ D_\text{KL}(p(x) \parallel q(x) ) = \int_{-\infty}^\infty p(x) \log\left(\frac{p(x)}{q(x)}\right)\, dx = \mathbb E_{p(x)} \left[\log\left(\frac{p(x)}{q(x)}\right) \right]. $$
Connection to Entropy
Notice that this expression is quite similar to entropy,
$$ H(p(x)) = \int_{-\infty}^{\infty} p(x) \log p(x) , dx. $$
The entropy describes the lower bound of the number of bits (if we use $\log_2$) of how the information can be compressed. By looking at the expression of the KL divergence, we intuitively interpret it as the information loss if we use distribution $q(x)$ to approximate distribution $p(x)$, Kurt2017Hierarchical Classificationhttps://datumorphism.leima.is/wiki/machine-learning/classification/hierarchical-classification/Tue, 30 Mar 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/classification/hierarchical-classification/Hierarchical Classification Problem Hierarchical classification labels involves hierarchical class labels. The hierarchical class labels maybe predefined or inferred. 1
Class Taxonomy A hierarchical classification problem comes with a class taxonomy.
“IS-A” operator: $\prec$, “IS-NOT-A” operator: $\nprec$ A IS-A relationship of the labels $c_a$ class set $C$ is
one root $R$ in the tree, asymmetric, i.e., $c_i \prec c_j$ and $c_j\prec c_i$ can not be both true, anti-reflexive, i.e., $c_i \nprec c_i$, transitive, i.e., $c_i \prec c_j$ and $c_j\prec c_k$ $\Rightarrow$ $c_i \prec c_k$. There are different representations of the hierarchical taxonomies.
Figure 2 in Silla2011, showing the difference between tree taxonomy and DAG taxonomy.Classifier Chains for Multilabel Classificationhttps://datumorphism.leima.is/wiki/machine-learning/classification/classifier-chains/Wed, 24 Mar 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/classification/classifier-chains/Multi-label problem In some classification problems, we have multilabel labels to be predicted. Many different approaches are proposed to solve such problems.
Algorithm Level Develop algorithms for multilabel problems, such as
Decision trees, AdaBoost. Problem Transformation On problem or data level, we can transform the multi-label problem to one or more single label problems.
Binary Relevance Method Binary relevance method, aka BM, transforms the problem into a single label problem by training a binary classifier for each label.
By doing so, the correlations between the target labels are lost.
Label Combination Method Label combination method (label power-set method), aka CM, combines the labels into single labels.Binning Data Values using Pandashttps://datumorphism.leima.is/til/programming/pandas/pandas-binning-values/Wed, 10 Mar 2021 00:00:00 +0000https://datumorphism.leima.is/til/programming/pandas/pandas-binning-values/Use the pd.cut function. The bins argument is using (] are the segments. The official documentation comes with detailed examples.
If pandas is not an option, one could use numpy.digitize to find which bin the elements belong to.Deal with Rare Categories Using Pandashttps://datumorphism.leima.is/til/data/deal-with-rare-categories-using-pandas/Wed, 10 Mar 2021 00:00:00 +0000https://datumorphism.leima.is/til/data/deal-with-rare-categories-using-pandas/We will illustrate how to deal with rare categories using pandas mask.
import pandas as pd ############# # Create fake names frequent_names = list('ABC') rare_names = list('DEF') dataset = sum( [[i]*10 for i in frequent_names] + [[i]*2 for i in rare_names], [] ) # Create a series based on the names series = pd.Series(dataset) print(series) # Find the counts of the names in the series series_counts = series.value_counts() print(series_counts) # Find names that has less than 10 counts # And create a mask mask = series.isin(series_counts.loc[series_counts<10].index) print(mask) # Set these rare names to X series[mask] = 'X' # Check the new series print(series.ANOVAhttps://datumorphism.leima.is/wiki/statistics/anova/Sun, 07 Mar 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/statistics/anova/In many problems, we have to test if several distributions associated with several groups of experiments are the same. The null hypothesis to be used is
The distributions of several groups are the same.
ANOVA tests the null hypothesis by comparing the variability between groups and within groups. If the variability between groups are significantly larger than the variability within groups, we are more confident that the distributions of different groups are different.
We will use two-group experiments as an example. We use a fake dataset:
Group A $x^A_1$ $x^A_2$ … $x^A_{N_A}$ Group B $x^B_1$ $x^B_2$ … $x^B_{N_B}$ Within Group Variability The within group variability is proportional toMcCulloch-Pitts Modelhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/mcculloch-pitts-model/Thu, 25 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/mcculloch-pitts-model/The McCulloch-Pitts model maps the input $\{x_1, x_2,\cdots, x_i \cdots, x_N \}$ into a scalar $y\in\{1,-1\}$,
$$ y = \operatorname{sign}( w\cdot x - b). $$
Since $w\cdot x - b = 0$ is a hyperplane, the McCulloch-Pitts model separates the state space using this hyperplane. The shift $b$ determines the interception, and $w$ decides the slope.Rosenblatt's Perceptronhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/rosenblatt-perceptron/Thu, 25 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/rosenblatt-perceptron/Rosenblatt’s perceptron connects McCulloch-Pitts neurons in levels.
Rosenblatt proposed that we fix all the weights and leave the weights of the last neuron free.
The first few layers but the last layer is used as a transformation of the input data ${x_1, \cdots, x_i, \cdots, x_N}$ into a new space ${z_1, \cdots, z_i, \cdots, z_{N'}}$. The classification is done on the ${z_1, \cdots, z_i, \cdots, z_{N'}}$ space by tuning the last neuron.
Initially, we set $w=0$. At step $k$,
if the sign prediction by the perceptron $( w_k \cdot z_{k+1} )$ is the same as the data $y_{k+1}$, i.ERM: Empirical Risk Minimizationhttps://datumorphism.leima.is/cards/machine-learning/learning-theories/empirical-risk-minimization/Thu, 18 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/learning-theories/empirical-risk-minimization/In a [[learning problem]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem is the minimization of this risk. Vapnik2000 … , empirical risk $R$ is a measurement the goodness of fit based on empirical information.SRM: Structural Risk Minimizationhttps://datumorphism.leima.is/cards/machine-learning/learning-theories/structural-risk-minimization/Thu, 18 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/learning-theories/structural-risk-minimization/[[ERM]] ERM: Empirical Risk Minimization In a [[learning problem]] The Learning Problem The learning problem posed by Vapnik:1 Given a sample: $\{z_i\}$ in the probability space $Z$; Assuming a probability measure on the probability space $Z$; Assuming a set of functions $Q(z, \alpha)$ (e.g. loss functions), where $\alpha$ is a set of parameters; A risk functional to be minimized by tunning “the handles” $\alpha$, $R(\alpha)$. The risk functional is $$ R(\alpha) = \int Q(z, \alpha) \,\mathrm d F(z). $$ A learning problem … may lead to overfitting since ERM only selects the model to fit the train data well.Coding Theory Conceptshttps://datumorphism.leima.is/cards/information/coding-theory-concepts/Wed, 17 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/cards/information/coding-theory-concepts/The code function produces code words. The expected length of the code word is limited by the entropy from the source probability $p$.
The Shannon information content, aka self-information, is described by
$$ - \log_2 p(x=a), $$
for the case that $x=a$.
The Shannon entropy is the expected information content for the whole sequence with probability distribution $p(x)$,
$$ \mathcal H = - \sum_x p(x\in X) \log_2 p(x). $$
The Shannon source coding theorem says that for $N$ samples from the source, we can roughly compress it into $N\mathcal H$.Empirical Losshttps://datumorphism.leima.is/cards/machine-learning/measurement/empirical-loss/Sat, 06 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/measurement/empirical-loss/Given a dataset with records $\{x_i, y_i\}$ and a model $\hat y_i = f(x_i)$ the empirical loss is calculated on all the records
$$ \begin{align} \mathcal L_{E} = \frac{1}{n} \sum_i^n d(y_i, f(x_i)), \end{align} $$
where $d(y_i, f(x_i))$ is the distance defined between $y_i$ and $f(x_i)$.Population Losshttps://datumorphism.leima.is/cards/machine-learning/measurement/population-loss/Sat, 06 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/measurement/population-loss/Given a dataset with records $\{x_i, y_i\}$ and a model $\hat y_i = f(x_i)$. Suppose we know the actual generating process of the dataset and the joint probability density distribution of all the data points is $p(x, y)$, the population loss is defined on the whole assumed population,
$$ \begin{align} \mathcal L_{P} = \mathop{\mathbb{E}}_{p(x,y)}[ d(y, f(x))], \end{align} $$
where $d(y, f(x))$ is the distance defined between $y$ and $f(x)$.Data File Formatshttps://datumorphism.leima.is/cards/machine-learning/datatypes/data-file-formats/Tue, 02 Feb 2021 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/datatypes/data-file-formats/Data storage is diverse. For data on smaller scales, we are mostly dealing with some data files.
work_with_data_files
Efficiencies and Compressions Parquet Parquet is fast. But
Don’t use json or list of json as columns. Convert them to strings or binary objects if it is really needed.Machine as a Hologramhttps://datumorphism.leima.is/projects/hologram/Sun, 31 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/projects/hologram/Tutorials on machine learning and data science productivity articlesLatent Variable Modelshttps://datumorphism.leima.is/wiki/machine-learning/bayesian/latent-variable-models/Wed, 27 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/bayesian/latent-variable-models/In the view of statistics, we know everything about a physical system if we know the probability $p(\mathbf s)$ of all possible states of the physical system $\mathbf s$. Time can also be part of the state specification.
As an example, we will classify fruits into oranges and non oranges. We will have the state vector $\mathbf s = (\text{is orange}, \text{texture } x)$. Our goal is to find the joint probability $p(\text{is orange}, x)$.
The reality, we only have sample data. This sample data usually can not cover all the possible states of the system. Thus a direct calculation to find the joint probability $p(\mathbf s)$ is not feasible.Reparametrization in Expectation Samplinghttps://datumorphism.leima.is/cards/statistics/reparametrization-expectation-sampling/Wed, 20 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/reparametrization-expectation-sampling/The expectation value of a function $f(z)$ over a Guassian distribution $\mathscr N(z;\mu, \sigma)$ is equivalent to the expectation value of $f()$ a Gaussian distribution $\mathscr N(z;\mu=0, \sigma=1)$, i.e.,
$$ {\mathbb E}_{\mathscr N(z; \mu, \sigma)} \left[ f(z) \right] = {\mathbb E}_{\mathscr N(z; 0, 1)} \left[ f() \right] $$
where
$$ \mathscr N = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \left( -\frac{(z-\mu)^2}{2\sigma^2}\right). $$
$$ \begin{align} {\mathbb E}_{\mathscr N(z; \mu, \sigma)} \left[ f(z) \right] &= \int \mathrm d z \frac{1}{\sqrt{2\pi\sigma^2}}\exp \left( -\frac{(z-\mu)^2}{2\sigma^2}\right) f(z) \\ &= \int \mathrm dz \frac{1}{\sigma} \frac{1}{\sqrt{2\pi}} \exp \left( -\frac{1}{2} \left(\frac{z-\mu}{\sigma}\right)^2 \right) f(z) \\ &= \int \mathrm d \left( \sigma z' + \mu \right) \frac{1}{\sigma} \frac{1}{\sqrt{2\pi}} \exp \left( -\frac{1}{2} z'^2 \right) f(\sigma z' + \mu) \\ &= \int \mathrm d z' \frac{1}{\sqrt{2\pi}}\exp \left( -\frac{1}{2} z'^2 \right) f(\sigma z' + \mu) \\ &= \int \mathrm d z' \mathscr N(z'; \mu=0, \sigma=1) f(\sigma z' + \mu) \\ &= {\mathbb E}_{\mathscr N(z'; \mu=0, \sigma=1)} \left[ f(\sigma z' + \mu) \right] \end{align} $$Normalizing Flows: An Introduction and Review of Current Methodshttps://datumorphism.leima.is/reading/normalizing-flow-introduction-1908.09257/Sun, 17 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/reading/normalizing-flow-introduction-1908.09257/To generate complicated distributions step by step from a simple and interpretable distribution.A Simple Machine Learning Project Frameworkhttps://datumorphism.leima.is/blog/data-science/a-simple-machine-learning-framework/Tue, 12 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/blog/data-science/a-simple-machine-learning-framework/The HaferML package is developed following this naive framework.
A simple almost stateless machine learning frameworkBasics of Redishttps://datumorphism.leima.is/wiki/computation/basics-of-redis/Fri, 08 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/wiki/computation/basics-of-redis/Basics Redis is:
NoSQL KeyValue In memory Data Structure Server binary safe strings lists, sets, sorted sets, hashes bitmaps, hyperloglogs Open source Redis is:
Fast Low CPU Requirement Scalable Redis can be used as:
Cache Analytics Leaderboard Queues Cookie storage Expiring data Messaging High I/O workloads API throttlings How to persist your data
Snapshot AOF: Append Only File Pros:
Redis has both data store and job queue built in Redis is a data structure server so is has flexibe data structures Redis is fast Cons:
Redis used a lot of RAM Redis can be be queued by value ComparisonsAudiolization of Covid 19 Data in Europehttps://datumorphism.leima.is/blog/ruthless/audiolization-of-covid19-in-eu/Sun, 03 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/blog/ruthless/audiolization-of-covid19-in-eu/Here is an audiolization sound track using a sample of covid19 data in Europe. The audio is the result of the audiorepr Python package I wrote.PREPhttps://datumorphism.leima.is/cards/communication/prep/Sun, 03 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/cards/communication/prep/PREP PREP is a framework for making your point.
PREP: Point + Reason + Example + Point Point: Make a point; PREP is a good method. Reason: Give the reason; Because it has a clear logic. Example: Show examples; The famous XYZ did ABC then everyone was convinced. Point: State the point for a conclusion.SCQ-Ahttps://datumorphism.leima.is/cards/communication/scq-a/Sun, 03 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/cards/communication/scq-a/SCQ-A SCQ-A: Situation + Conflict + Question + Answer SCA-A is a framework for problem-solving.
Situation: background knowledge, set the stage Complications: what is happening Question: propose your hypothesis Answer: accept or reject the hypothesisWWHhttps://datumorphism.leima.is/cards/communication/wwh/Sun, 03 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/cards/communication/wwh/WWH WWH: What (happened) + Why (this happened) + How (to improve)PyTorch: Initialize Parametershttps://datumorphism.leima.is/til/machine-learning/pytorch/pytorch-initial-params/Fri, 01 Jan 2021 00:00:00 +0000https://datumorphism.leima.is/til/machine-learning/pytorch/pytorch-initial-params/We can set the parameters in a for loop. We take some of the initialization methods from Lippe1.
To set based on the input dimension of the layer ( [[Initialize Artificial Neural Networks]] Initialize Artificial Neural Networks Initialize a neural network is important for the training and performance. Some initializations simply don't work, some will degrade the performance of the model. We should choose wisely. ) (normalized initialization),
for name, param in model.named_parameters(): if name.endswith(".bias"): param.data.fill_(0) else: bound = math.sqrt(6)/math.sqrt(param.shape[0]+param.shape[1]) param.data.uniform_(-bound, bound) or set the parameters based on the input size of each layer
for name, param in model.Graph Creationhttps://datumorphism.leima.is/reading/grammar-of-graphics/graph-creation/Tue, 29 Dec 2020 00:00:00 +0000https://datumorphism.leima.is/reading/grammar-of-graphics/graph-creation/Stages Three stages of making a graph:
Specification Assembly Display Specification Statistical graphic specifications are expressed in six statements
DATA: a set of data operations that create variables from datasets TRANS: variable transformations (e.g., rank) SCALE: scale transformations (e.g., log) COORD: a coordinate system (e.g., polar) ELEMENT: graphs (e.g., points) and their aesthetic attributes (e.g., color) GUIDE: one or more guides (axes, legends, etc.) Assembly Assembling a scene from a specification requires a variety of structures in order to index and link components with each other. One of the structures we can use is a network or a tree.Multiset, mset or baghttps://datumorphism.leima.is/cards/math/multiset-mset-bag/Sun, 27 Dec 2020 00:00:00 +0000https://datumorphism.leima.is/cards/math/multiset-mset-bag/A bag is a set in which duplicate elements are allowed.
An ordered bag is a list that we use in programming.Python Class Sequential Inheritancehttps://datumorphism.leima.is/til/programming/python/python-class-inheritance-sequential/Thu, 03 Dec 2020 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-class-inheritance-sequential/# An experiment on python super class Base: def __init__(self): print("Start A") print("End A") class IA(Base): def __init__(self): print("Start IA") super(IA, self).__init__() print("End IA") class IB(IA): def __init__(self): print("Start IB") super(IB, self).__init__() print("End IB") print("Experiment 1:") ib = IB()Three dots in Pythonhttps://datumorphism.leima.is/til/programming/python/python-three-dots/Thu, 03 Dec 2020 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-three-dots/Using three dots in Python:
from abc import abstractmethod class A: def __init__(self): self.name = "A" print("Init") def three_dots(self): ... @abstractmethod def abs_three_dots(self): ... def raise_it(self): raise Exception("Not yet done") a = A() print("\nthree_dots") print(a.three_dots()) print("\nabs_three_dots") print(a.abs_three_dots()) print("\nraise_it") a.raise_it() Returns
three_dots None abs_three_dots None raise_it Traceback (most recent call last): File "main.py", line 27, in <module> a.raise_it() File "main.py", line 14, in raise_it raise Exception("Not yet done") Exception: Not yet doneOrdered Member Functions of a Class in Pythonhttps://datumorphism.leima.is/til/programming/python/python-class-methods-ordered/Wed, 02 Dec 2020 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-class-methods-ordered/# References: # 1. https://stackoverflow.com/questions/48145317/can-i-add-attributes-to-class-methods-in-python from functools import wraps # Define a decorator def attributes(**attrs): """ Set attributes of member functions in a class. ``` class AGoodClass: def __init__(self): self.size = 0 @attributes(order=1) def first_good_member(self, new): return "first good member" @attributes(order=2) def second_good_member(self, new): return "second good member" ``` References: 1. https://stackoverflow.com/a/48146924/1477359 """ def decorator(f): @wraps(f) def wrapper(*args, **kwargs): return f(*args, **kwargs) for attr_name, attr_value in attrs.items(): setattr(wrapper, attr_name, attr_value) return wrapper return decorator class AGoodClass: def __init__(self): self.size = 0 @attributes(order=1) def first_good_member(self, new): return "first good member" @attributes(order=2) def second_good_member(self, new): return "second good member" # Test agc = AGoodClass() print(agc.Postgres Optimization in JOINhttps://datumorphism.leima.is/til/data/postgres.join-begin-with-smallest-cardinality/Sat, 28 Nov 2020 11:39:21 +0100https://datumorphism.leima.is/til/data/postgres.join-begin-with-smallest-cardinality/Join tables together starting with the smallest table (table with less cardinality) speeds things up.Deal with NULL in Postgreshttps://datumorphism.leima.is/til/data/postgres.deal-with-null/Thu, 26 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/til/data/postgres.deal-with-null/Please deal with null carefully.Akaike Information Criterionhttps://datumorphism.leima.is/cards/statistics/aic/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/aic/Suppose we have a model that describes the data generation process behind a dataset. The distribution by the model is denoted as $\hat f$. The actual data generation process is described by a distribution $f$.
We ask the question:
How good is the approximation using $\hat f$?
To be more precise, how much information is lost if we use our model dist $\hat f$ to substitute the actual data generation distribution $f$?
AIC defines this information loss as
$$ \mathrm{AIC} = - 2 \ln p(y|\hat\theta) + 2k $$
$y$: data set $\hat\theta$: parameter of the model that is estimated by maximum-likelihood $\ln p(y|\hat\theta)$: log maximum likelihood (the goodness-of-fit) $k$: number of adjustable model params; $+2k$ is then a penalty.Bayes Factorshttps://datumorphism.leima.is/cards/statistics/bayes-factors/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/bayes-factors/$$ \frac{p(\mathscr M_1|y)}{ p(\mathscr M_2|y) } = \frac{p(\mathscr M_1)}{ p(\mathscr M_2) }\frac{p(y|\mathscr M_1)}{ p(y|\mathscr M_2) } $$
Bayes factor
$$ \mathrm{BF_{12}} = \frac{m(y|\mathscr M_1)}{m(y|\mathscr M_2)} $$
$\mathrm{BF_{12}}$: how many time more likely is model $\mathscr M_1$ than $\mathscr M_2$.Bayesian Information Criterionhttps://datumorphism.leima.is/cards/statistics/bic/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/bic/BIC is Bayesian information criterion, it replaced the $+2k$ term in [[AIC]] Akaike Information Criterion Suppose we have a model that describes the data generation process behind a dataset. The distribution by the model is denoted as $\hat f$. The actual data generation process is described by a distribution $f$. We ask the question: How good is the approximation using $\hat f$? To be more precise, how much information is lost if we use our model dist $\hat f$ to substitute the actual data generation distribution $f$? AIC defines this information loss as $$ \mathrm{AIC} = - 2 \ln p(y|\hat\theta) + … with $k\ln n$ to bring in punishment for the number of parameters of the model based on the number of data records,Fisher Information Approximationhttps://datumorphism.leima.is/cards/statistics/fia/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/fia/FIA is a method to describe the minimum description length ( [[MDL]] Minimum Description Length MDL is a measure of how well a model compresses data by minimizing the combined cost of the description of the model and the misfit. ) of models,
$$ \mathrm{FIA} = -\ln p(y | \hat\theta) + \frac{k}{2} \ln \frac{n}{2\pi} + \ln \int_\Theta \sqrt{ \operatorname{det}[I(\theta)] d\theta } $$
$I(\theta)$: Fisher information matrix of sample size 1. $$I_{i,j}(\theta) = E\left( \frac{\partial \ln p(y| \theta)}{\partial \theta_i}\frac{ \partial \ln p (y | \theta) }{ \partial \theta_j } \right)$$.Kolmogorov Complexityhttps://datumorphism.leima.is/cards/statistics/kolmogorov-complexity/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/kolmogorov-complexity/Description of Data
The measurement of complexity is based on the observation that the compressibility of data doesn’t depend on the “language” used to describe the compression process that much. This makes it possible for us to find a universal language, such as a universal computer language, to quantify the compressibility of the data.
One intuitive idea is to use a programming language to describe the data. If we have a sequence of data,
0,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,…,9999
It takes a lot of space if we show the complete sequence. However, our math intuition tells us that this is nothing but a list of consecutive numbers from 0 to 9999.Minimum Description Lengthhttps://datumorphism.leima.is/cards/statistics/mdl/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/mdl/The minimum description length, aka, MDL, is based on the relations between regularity and data compression. (See [[Kolmogorov complexity]] Kolmogorov Complexity Description of Data The measurement of complexity is based on the observation that the compressibility of data doesn’t depend on the “language” used to describe the compression process that much. This makes it possible for us to find a universal language, such as a universal computer language, to quantify the compressibility of the data. One intuitive idea is to use a programming language to describe the data. If we have a sequence of data, … for more about data descriptions.Normalized Maximum Likelihoodhttps://datumorphism.leima.is/cards/statistics/nml/Sun, 08 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/nml/$$ \mathrm{NML} = \frac{ p(y| \hat \theta(y)) }{ \int_X p( x| \hat \theta (x) ) dx } $$Experiments in Biologyhttps://datumorphism.leima.is/blog/ruthless/experiments-in-biology/Sun, 01 Nov 2020 00:00:00 +0000https://datumorphism.leima.is/blog/ruthless/experiments-in-biology/ Inspired by @hanlu.ioThe Science Part in Data Sciencehttps://datumorphism.leima.is/blog/ruthless/science-part-in-data-science/Sat, 31 Oct 2020 00:00:00 +0000https://datumorphism.leima.is/blog/ruthless/science-part-in-data-science/graph TD; s1(An Idea)--d1{Is this idea in the current literature?}; d1{Is this idea in the current literature?}--|Yes|b1(Fail); d1{Is this idea in the current literature?}--|No|b2[Weeks of work]; b2[Weeks of work]--b1(Fail);Conditional Probability Tablehttps://datumorphism.leima.is/cards/statistics/conditional-probability-table/Tue, 27 Oct 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/conditional-probability-table/The conditional probability table, aka CPT, is used to calculate conditional probabilities from a dataset.
Given a dataset with features $\mathbf X$ and their corresponding classes $\mathbf Y$, the conditional probabilities of each class given a certain feature value can be calculated using a CPT which in turn can be calculated using a [[contigency table]] Correlation Coefficient and Covariance for Numeric Data Detecting correlations using correlations for numeric data .Pandas Groupby Does Not Guarantee Unique Content in Groupby Columnshttps://datumorphism.leima.is/til/machine-learning/pandas-groupby-caveats/Mon, 20 Apr 2020 00:00:00 +0000https://datumorphism.leima.is/til/machine-learning/pandas-groupby-caveats/Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns, it also considers the datatypes. Dealing with mixed types requires additional attention.== and is in Pythonhttps://datumorphism.leima.is/til/programming/python/python-none/Wed, 01 Apr 2020 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-none/== and is are differentBonferroni Correctionhttps://datumorphism.leima.is/cards/statistics/bonferroni-correction/Wed, 01 Apr 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/bonferroni-correction/In a single hypothesis testing problem, we the [[type I error]] Types of Errors in Statistical Hypothesis Testing We all make mistakes. The question is, what kind of mistakes. : Rejecting the one null hypothesis when it is actually true. Given a threshold $\alpha$, we can find out the interval $\Gamma$ that leads to a probability of rejecting the hypothesis $p\leq\alpha$ (single-sided).
In a [[multiple comparisons problem]] Multiple Comparison Problem In a multiple comparisons problem, we deal with multiple statistical tests simultaneously. Examples We see such problems a lot in IT companies. Suppose we have a website and would like to test if a new design of a button can lead to some changes in five different KPIs (e.Multiple Comparison Problemhttps://datumorphism.leima.is/cards/statistics/multiple-comparison-problem/Wed, 01 Apr 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/multiple-comparison-problem/In a multiple comparisons problem, we deal with multiple statistical tests simultaneously.
Examples We see such problems a lot in IT companies. Suppose we have a website and would like to test if a new design of a button can lead to some changes in five different KPIs (e.g., view-to-click rate, click-to-book rate, …).
In multi-horizon time series forecasting, we sometimes choose to forecast multiple future data points in one shot. To properly find the confidence intervals of our predictions, one approach is the so called conformal prediction method. This becomes a multiple comparisons problem because we have to tell if we can reject at least one true null hypothesis.Arcsine Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/arcsine/Sat, 14 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/arcsine/Arcsine Distribution The PDF is
$$ \frac{1}{\pi\sqrt{x(1-x)}} $$
for $x\in [0,1]$.
It can also be generalized to
$$ \frac{1}{\pi\sqrt{(x-1)(b-x)}} $$
for $x\in [a,b]$.
VisualizeBernoulli Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/bernoulli/Sat, 14 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/bernoulli/Two categories with probability $p$ and $1-p$ respectively.
For each experiment, the sample space is $\{A, B\}$. The probability for state $A$ is given by $p$ and the probability for state $B$ is given by $1-p$. The Bernoulli distribution describes the probability of $K$ results with state $s$ being $s=A$ and $N-K$ results with state $s$ being $B$ after $N$ experiments,
$$ P\left(\sum_i^N s_i = K \right) = C _ N^K p^K (1 - p)^{N-K}. $$Beta Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/beta/Sat, 14 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/beta/Beta Distribution Interact Alpha Beta mode ((beta_mode)) median ((beta_median)) mean ((beta_mean)) ((makeGraph))Binomial Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/binomial/Sat, 14 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/binomial/The number of successes in $n$ independent events where each trial has a success rate of $p$.
PMF:
$$ C_n^k p^k (1-p)^{n-k} $$Categorical Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/categorical/Sat, 14 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/categorical/By generalizing the Bernoulli distribution to $k$ states, we get a categorical distribution. The sample space is $\{s_1, s_2, \cdots, s_k\}$. The corresponding probabilities for each state are $\{p_1, p_2, \cdots, p_k\}$ with the constraint $\sum_{i=1}^k p_i = 1$.Cauchy-Lorentz Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/cauchy/Sat, 14 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/cauchy/Cauchy-Lorentz Distribution .. ratio of two independent normally distributed random variables with mean zero.
Source: https://en.wikipedia.org/wiki/Cauchy_distribution
Lorentz distribution is frequently used in physics.
PDF:
$$ \frac{1}{\pi\gamma} \left( \frac{\gamma^2}{ (x-x_0)^2 + \gamma^2} \right) $$
The median and mode of the Cauchy-Lorentz distribution is always $x_0$. $\gamma$ is the FWHM.
VisualizeGamma Distributionhttps://datumorphism.leima.is/cards/statistics/distributions/gamma/Sat, 14 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/distributions/gamma/Gamma Distribution PDF:
$$ \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)} $$
VisualizeDiagnolize Matriceshttps://datumorphism.leima.is/cards/math/diagonalize-matrix/Wed, 11 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/math/diagonalize-matrix/Given a matrix $\mathbf A$, it is diagonalized using its eigenvectors.
Why are the eigenvectors needed?
Eigenvectors of a matrix $\mathbf A$ are the preferred directions. From the definition of eigenvectors,
$$ \mathbf A \mathbf x = \lambda \mathbf x, $$
we know that the matrix $\mathbf A$ only scales the eigenvectors and no rotations. These directions are special to the matrix $\mathbf A$.
Find the eigenvectors $\mathbf x_i$ of the matrix $\mathbf A$; If we find degerations, the matrix is not diagonalizable. Construct a matrix $\mathbf S = \begin{pmatrix} \mathbf x_1 & \mathbf x_2 & \cdots & \mathbf x_n \end{pmatrix}$; The matrix $\mathbf A$ is diagonalize using $\mathbf S^{-1} \mathbf A \mathbf S = \mathbf {A_D}$Mahalanobis Distancehttps://datumorphism.leima.is/cards/math/mahalanobis-distance/Wed, 11 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/math/mahalanobis-distance/Mahalanobis distance is a distance calculated using the inverse of the covariance matrix as the metric. For two vectors $\mathbf x$ and $\mathbf y$, the Mahalanobis distance is
$$ d^2 = (x_i - \bar x) g_{ij} (y_j - \bar y), $$
where $g_{ij} = (S^{-1})_{ij}$ and $\mathbf S$ is the covariance matrix.
The covariance is a normalization that mitigates the covariances.Covariance Matrixhttps://datumorphism.leima.is/cards/statistics/covariance-matrix/Tue, 10 Mar 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/covariance-matrix/We use Einstein’s summation convention. Covariance of two discrete series $A$ and $B$ is defined as
$$ \text{Cov} ({A,B}) = \sigma_{A,B}^2 = \frac{ (a_i - \bar A) (b_i - \bar B) }{ n- 1 }, $$
where $n$ is the length of the series. The normalization factor is set to $1/(n-1)$ to mitigate the bias for small $n$.
One could show that
$$ \mathrm{Cov}({A,B}) = E( A,B ) - \bar A \bar B. $$
At first glance, the square in the definition seems to be only for notation purpose at this point.
Meanwhile, using this idea of the mean of geometric mean, we could easily generalize it to the covariance of three series,Jackknife Resamplinghttps://datumorphism.leima.is/cards/statistics/jacknife-resampling/Sun, 26 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/jacknife-resampling/Jackknife resampling is a method for estimation of the mean and higher order moments.
Given a sample $\{x_i\}$ of size $n$ for the distribution $X$, the jackknife resampling estimates the mean by leaving out each data point systematically. $n$ estimations of the mean will be obtained, with each of the estimations $x_i$
$$ \bar x_i = \frac{1}{n-1} \sum_{j\neq i} x_j. $$
The mean of the sample is
$$ \bar x = \frac{1}{n}\sum_i \bar x_i = \frac{1}{n} \sum_i \left(\frac{1}{n-1} \sum_{j\neq i} x_j\right) = \frac{1}{n}\sum_i x_i. $$
The result is consistent with other sample mean methods. Jackknife estimates the variance of the sampleCBOW: Continuous Bag of Wordshttps://datumorphism.leima.is/cards/machine-learning/embedding/continuous-bag-of-words/Thu, 16 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/embedding/continuous-bag-of-words/Here we encode all words presented in the corpus to demostrate the idea of CBOW. In the real world, we might want to remove some certain words such as the. We use the following quote by Ford in Westworld as an example.
I read a theory once that the human intellect is like peacock feathers. Just an extravagant display intended to attract a mate, just an elaborate mating ritual. But, of course, the peacock can barely fly. It lives in the dirt, pecking insects out of the muck, consoling itself with its great beauty.
The word intended is surrunded by extravagant display in the front and to attract after it.Data Typeshttps://datumorphism.leima.is/cards/machine-learning/datatypes/data-types/Thu, 16 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/datatypes/data-types/Gini Impurityhttps://datumorphism.leima.is/cards/machine-learning/measurement/gini-impurity/Thu, 16 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/measurement/gini-impurity/The code used in this article can be found in this repo. Suppose we have a dataset $\{0,1\}^{10}$, which has 10 records and 2 possible classes of objects $\{0,1\}$ in each record.
The first example we investigate is a pure 0 dataset.
object 0 0 0 0 0 0 0 0 0 0 0 0 For such an all-0 dataset, we would like to define its impurity as 0.Information Gainhttps://datumorphism.leima.is/cards/machine-learning/measurement/information-gain/Thu, 16 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/measurement/information-gain/Information gain is a frequently used metric in calculating the gain during a split in tree-based methods.
First o all, the entropy of a dataset if defined as
$$ S = - sum_i p_i \log p_i - sum_i (1-p_i)\log p_i, $$
where $p_i$ is the probability of a class.
The information gain is the difference between the entropy.
For example, in a decision tree algorithm, we would split a node. Before splitting, we assign a label $m$ to the node,
$$ S_m = - p_m \log p_m - (1-p_m)\log p_m. $$
After the splitting, we have two groups that contributes to the entropy, group $L$ and group $R$,Negative Samplinghttps://datumorphism.leima.is/cards/machine-learning/embedding/negative-sampling/Thu, 16 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/embedding/negative-sampling/Knowledge of [[CBOW]] CBOW: Continuous Bag of Words Use the context to predict the center word or [[skipgram]] skipgram: Continuous skip-gram Use the center word to predict the context is required.
A naive model to train a model of words is to
encode input words and output words using vectors, use the input word vector to predict the output word vector, calculate the errors between predicted output word vector and real output word vector, minimize the errors. However, it is very expensive to project out the output words and calculate the error every time.PAC: Probably Approximately Correcthttps://datumorphism.leima.is/cards/machine-learning/learning-theories/pac/Thu, 16 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/learning-theories/pac/skipgram: Continuous skip-gramhttps://datumorphism.leima.is/cards/machine-learning/embedding/continuous-skip-gram/Thu, 16 Jan 2020 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/embedding/continuous-skip-gram/We use the following quote by Ford in Westworld as an example.
I read a theory once that the human intellect is like peacock feathers. Just an extravagant display intended to attract a mate, just an elaborate mating ritual. But, of course, the peacock can barely fly. It lives in the dirt, pecking insects out of the muck, consoling itself with its great beauty.
The word intended is surrunded by extravagant display in the front and to attract after it. The task is to predict the probability of words around the middle word intended, which are the ‘history words’ extravagant, display and ‘future words’ to, attract in our case.Improving Document Ranking with Dual Word Embeddingshttps://datumorphism.leima.is/reading/word2vec-in-out-embedding/Sat, 05 Oct 2019 00:00:00 +0000https://datumorphism.leima.is/reading/word2vec-in-out-embedding/Word2vec produces two embedding spaces, the in-embedding and out-embedding.Switch statement in Pythonhttps://datumorphism.leima.is/til/programming/python/python-switch-statement/Tue, 20 Aug 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-switch-statement/Love switch statement? We can design a switch statement it in python.Python Tilde Operatorhttps://datumorphism.leima.is/til/programming/python/python-tilde-operator/Thu, 15 Aug 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-tilde-operator/tilde operator may not work as you expectedArrays and Dicts in MongoDBhttps://datumorphism.leima.is/til/programming/database/mongodb-array-and-dict/Wed, 14 Aug 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/database/mongodb-array-and-dict/Array of dictionaries becomes hard to update in MongoDB.eval in Python is Dangeroushttps://datumorphism.leima.is/til/programming/python/python-eval/Tue, 13 Aug 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-eval/eval is powerful but really dangerousDealing with Missing Data in Machine Learninghttps://datumorphism.leima.is/wiki/machine-learning/feature-engineering/missing-data/Mon, 05 Aug 2019 00:00:00 +0000https://datumorphism.leima.is/wiki/machine-learning/feature-engineering/missing-data/How to Deal with Missing Data Remove Listwise deletion: Remove the whole record; Works if the missing values are random. Removing values causes problem in many aspects. For example, we can not just delete data when applying our models. Replace with most frequent value central tendency: median, mean, etc fixed value: a string etc New Category: define a new category for missing data Convert the column to a binary valued column indicating if the feature is missing or not. Tools pandas sklearn: Imputer @ResidentMario/missingno : visualize missing dataCramér's Vhttps://datumorphism.leima.is/cards/statistics/cramers-v/Sat, 20 Jul 2019 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/cramers-v/Kendall Tau Correlationhttps://datumorphism.leima.is/cards/statistics/kendall-correlation-coefficient/Sat, 20 Jul 2019 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/kendall-correlation-coefficient/Definition two series of data: $X$ and $Y$ cooccurance of them: $(x_i, x_j)$, and we assume that $i<j$ concordant: $x_i < x_j$ and $y_i < y_j$; $x_i > x_j$ and $y_i > y_j$; denoted as $C$ discordant: $x_i < x_j$ and $y_i > y_j$; $x_i > x_j$ and $y_i < y_j$; denoted as $D$ neither concordant nor discordant: whenever equal sign happens Kendall’s tau is defined as
$$ \begin{equation} \tau = \frac{C- D}{\text{all possible pairs of comparison}} = \frac{C- D}{n^2/2 - n/2} \end{equation} $$Bayes' Theoremhttps://datumorphism.leima.is/cards/statistics/bayes-theorem/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/bayes-theorem/Bayes' Theorem is stated as
$$ P(A\mid B) = \frac{P(B \mid A) P(A)}{P(B)} $$
$P(A\mid B)$: likelihood of A given B $P(A)$: marginal probability of A There is a nice tree diagram for the Bayes' theorem on Wikipedia.
Tree diagram of Bayes' theoremCanonical Decompositionhttps://datumorphism.leima.is/cards/math/canonical-decomposition/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/canonical-decomposition/I find this slide from Christoph Freudenthaler very useful.
Canonical decomposition visualized by Christoph FreudenthalerCholesky Decompositionhttps://datumorphism.leima.is/cards/math/cholesky-decomposition/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/cholesky-decomposition/$$ A = L L^T $$Khatri-Rao Producthttps://datumorphism.leima.is/cards/math/khatri-rao/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/khatri-rao/$$ \mathbf{A} \ast \mathbf{B} = \left(\mathbf{A}_{ij} \otimes \mathbf{B}_{ij}\right)_{ij} $$Modes and Slices of Tensorshttps://datumorphism.leima.is/cards/math/modes-and-slices-of-tensor/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/modes-and-slices-of-tensor/ Modes of a tensor Slices of a tensorPoisson Processhttps://datumorphism.leima.is/cards/statistics/poisson-process/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/statistics/poisson-process/SVD: Singular Value Decompositionhttps://datumorphism.leima.is/cards/math/svd/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/svd/Given a matrix $\mathbf X \to X_{m}^{\phantom{m}n}$, we can decompose it into three matrices
$$ X_{m}^{\phantom{m}n} = U_{m}^{\phantom{m}k} D_{k}^{\phantom{k}l} (V_{n}^{\phantom{n}l} )^{\mathrm T}, $$
where $D_{k}^{\phantom{k}l}$ is diagonal.
Here we have $\mathbf U$ being constructed by the eigenvectors of $\mathbf X \mathbf X^{\mathrm T}$, while $\mathbf V$ is being constructed by the eigenvectors of $\mathbf X^{\mathrm T} \mathbf X$ (which is also the reason we keep the transpose).
I find this slide from Christoph Freudenthaler very useful. The original slide has been added as a reference to this article.
SVD visualized by Christoph FreudenthalerTucker Decompositionhttps://datumorphism.leima.is/cards/math/tucker-decomposition/Tue, 18 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/tucker-decomposition/I find this slide from Christoph Freudenthaler very useful. For the definition of mode 1/2/3 unfold, please refer to Modes and Slices of Tensors.
Tucker decomposition visualized by Christoph FreudenthalerFrobenius distancehttps://datumorphism.leima.is/cards/math/frobenius-distance/Mon, 17 Jun 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/frobenius-distance/Frobenius distance between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} W_r^{\phantom{r}k}$,
$$ \lVert X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k} \rVert^2 \equiv \sum_{n,k} (X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k})^2. $$Levenshtein Distancehttps://datumorphism.leima.is/cards/math/levenshtein-distance/Sun, 19 May 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/levenshtein-distance/Levenshtein distance calculates the number of operations needed to change one word to another by applying single-character edits (insertions, deletions or substitutions).
The reference explains this concept very well. For consistency, I extracted a paragraph from it which explains the operations in Levenshtein algorithm. The source of the following paragraph is the first reference of this article.
Levenshtein Matrix
Cell (0:1) contains red number 1. It means that we need 1 operation to transform M to an empty string. And it is by deleting M. This is why this number is red. Cell (0:2) contains red number 2. It means that we need 2 operations to transform ME to an empty string.n-gramhttps://datumorphism.leima.is/cards/math/n-gram/Sun, 19 May 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/n-gram/n-gram is a method to split words into set of substring elements so that those can be used to match words.
Examples Use the following examples to get your first idea about it. I created two columns so that we could compare the n-grams of two different words side-by-side.
n in n-gram is Word One Clean Word: (( sentenceOneWords )) n-grams: (( sentenceOneWordsnGram )) Word Two Clean Word: (( sentenceTwoWords )) n-grams: (( sentenceTwoWordsnGram )) /*************************/ /** The function nGram is a copy of https://github.com/words/n-gram , under MIT License **/ nGram.Add New Kernels to Jupyter Notebook in Conda Environmenthttps://datumorphism.leima.is/til/programming/jupyter-notebook-add-new-kernels-in-conda-env/Sun, 12 May 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/jupyter-notebook-add-new-kernels-in-conda-env/Python package or python module autoreloading in jupyter notebookAuto-reload Python Packages or Python Modules in Jupyter Notebookhttps://datumorphism.leima.is/til/programming/jupyter-notebook-autoreload-python-modules-or-packages/Sun, 12 May 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/jupyter-notebook-autoreload-python-modules-or-packages/Python package or python module autoreloading in jupyter notebookBigQuery Meta Tableshttps://datumorphism.leima.is/til/data/bigquery-meta-tables/Sun, 12 May 2019 00:00:00 +0000https://datumorphism.leima.is/til/data/bigquery-meta-tables/Meta tables are very useful when it comes to get bigquery table information programmatically.Calculate Moving Average Using SQL/BigQqueryhttps://datumorphism.leima.is/til/data/bigquery-moving-average/Sun, 12 May 2019 00:00:00 +0000https://datumorphism.leima.is/til/data/bigquery-moving-average/Snippet for calculating moving avg using sql/biguqeryGenerate a Column of Continuous Dates in BigQueryhttps://datumorphism.leima.is/til/data/bigquery-generate-continuous-dates-as-a-column/Sun, 12 May 2019 00:00:00 +0000https://datumorphism.leima.is/til/data/bigquery-generate-continuous-dates-as-a-column/Generate a table with a column of continuous datesGet Current User in BigQueryhttps://datumorphism.leima.is/til/data/bigquery-get-current-user/Sun, 12 May 2019 00:00:00 +0000https://datumorphism.leima.is/til/data/bigquery-get-current-user/BigQuery Current UserMaterialize the Query Result for Performancehttps://datumorphism.leima.is/til/data/bigquery-materialize-query-results-for-performance/Sun, 12 May 2019 00:00:00 +0000https://datumorphism.leima.is/til/data/bigquery-materialize-query-results-for-performance/Materialize the query result for multistage queries to make your query faster and lower the costs.Cosine Similarityhttps://datumorphism.leima.is/cards/math/cosine-similarity/Mon, 06 May 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/cosine-similarity/As simple as the inner product of two vectors
$$ d_{cos} = \frac{\vec A}{\vert \vec A \vert} \cdot \frac{\vec B }{ \vert \vec B \vert} $$
Examples To use cosine similarity, we have to vectorize the words first. There are many different methods to achieve this. For the purpose of illustrating cosine similarity, we use term frequency.
Term frequency is the occurrence of the words. We do not deal with duplications so duplicate words will have some effect on the similarity.
In principle, we could also use word set for a sentence to remove the effect of duplicate words. In most cases, if a word is repeating, it would indeed make the sentences different.Eigenvalues and Eigenvectorshttps://datumorphism.leima.is/cards/math/eigendecomposition/Mon, 06 May 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/eigendecomposition/To find the eigenvectors $\mathbf x$ of a matrix $\mathbf A$, we construct the eigen equation
$$ \mathbf A \mathbf x = \lambda \mathbf x, $$
where $\lambda$ is the eigenvalue.
We rewrite it in the components form,
$$ \begin{equation} A_{ij} x_j = \lambda x_i. \label{eqn-eigen-decomp-def} \end{equation} $$
Mathematically speaking, it is straightforward to find the eigenvectors and eigenvalues.
Eigenvectors are Special Directions Judging from the definition in Eq.($\ref{eqn-eigen-decomp-def}$), the eigenvectors do not change direction under the operation of the matrix $\mathbf A$.
Reconstruct $\mathbf A$ We can reconstruct $\mathbf A$ using the eigenvalues and eigenvectors.
First of all, we will construct a matrix of eigenvectors,Jaccard Similarityhttps://datumorphism.leima.is/cards/math/jaccard-similarity/Mon, 06 May 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/jaccard-similarity/Jaccard index is the ratio of the size of the intersect of the set and the size of the union of the set.
$$ J(A, B) = \frac{ \vert A \cap B \vert }{ \vert A \cup B \vert } $$
Jaccard distance $d_J(A,B)$ is defined as
$$ d_J(A,B) = 1 - J(A,B). $$
Properties If the two sets are the same, $A=B$, we have $J(A,B)=1$ or $d_J(A,B)=0$. We have maximum similarity.
If the two sets have nothing in common, we have $J(A,B)=0$ or $d_J(A,B)=1$. We have minimum similarity.
Examples Sentence One Word Set: (( sentenceOneWords )) Sentence Two Word Set: (( sentenceTwoWords )) Intersect: (( intersectWords )) Union: (( unionWords )) Jaccard Index: (( jaccardIndex )) Jaccard Distance: (( jaccardDistance )) Vue.Term Frequency - Inverse Document Frequencyhttps://datumorphism.leima.is/cards/math/tf-idf/Mon, 06 May 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/tf-idf/The Art of Data Sciencehttps://datumorphism.leima.is/reading/art-of-data-science/Fri, 19 Apr 2019 00:00:00 +0000https://datumorphism.leima.is/reading/art-of-data-science/A nice and elegant book on data scienceAwesome Stuffhttps://datumorphism.leima.is/projects/awesome/Sun, 07 Apr 2019 00:00:00 +0000https://datumorphism.leima.is/projects/awesome/Summarizations, workflows, experiences, fails, etcBlog Postshttps://datumorphism.leima.is/projects/blog/Sun, 07 Apr 2019 00:00:00 +0000https://datumorphism.leima.is/projects/blog/My blog posts for fun.Combinationshttps://datumorphism.leima.is/cards/math/combinations/Sun, 07 Apr 2019 00:00:00 +0000https://datumorphism.leima.is/cards/math/combinations/Choose X from N is
$$ C_N^X = \frac{N!}{ X! (N-X)! } $$My Data Wikihttps://datumorphism.leima.is/projects/wiki/Sun, 07 Apr 2019 00:00:00 +0000https://datumorphism.leima.is/projects/wiki/A collection of my wiki articles related to data.My Knowledge Cardshttps://datumorphism.leima.is/projects/cards/Sun, 07 Apr 2019 00:00:00 +0000https://datumorphism.leima.is/projects/cards/A collection of my snippets of knowledgeMy Reading Noteshttps://datumorphism.leima.is/projects/reading/Sun, 07 Apr 2019 00:00:00 +0000https://datumorphism.leima.is/projects/reading/A collection of my reading notesTILhttps://datumorphism.leima.is/projects/til/Sun, 07 Apr 2019 00:00:00 +0000https://datumorphism.leima.is/projects/til/Today I LearnedHuman Graphical Perception of Quantitative Information in Data Visualizationhttps://datumorphism.leima.is/reading/graphical-perception/Sun, 17 Mar 2019 00:00:00 +0000https://datumorphism.leima.is/reading/graphical-perception/Data visualization caveatsAdd Data Files to Python Packagehttps://datumorphism.leima.is/til/programming/python/python-package-including-data-file/Wed, 13 Mar 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-package-including-data-file/Add Data Files to Python Package using manifest.in and setup.pyInstalling requirements.txt in Conda Environmentshttps://datumorphism.leima.is/til/programming/python/python-anaconda-install-requirements/Wed, 13 Mar 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-anaconda-install-requirements/Why is pip install -r requirements.txt not working in conda?Information Theory and Statistical Mechanicshttps://datumorphism.leima.is/reading/statistical-physics-and-information-theory/Fri, 01 Mar 2019 00:00:00 +0000https://datumorphism.leima.is/reading/statistical-physics-and-information-theory/Max entropy principle as a method to infer distributions of statistical systemsFlatten 2D List in Pythonhttps://datumorphism.leima.is/til/programming/python/python-flatten-2d-list/Wed, 23 Jan 2019 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-flatten-2d-list/Flatten 2D list using sumPython Datetime on Different OShttps://datumorphism.leima.is/til/programming/python/python-datetime-on-different-os/Mon, 31 Dec 2018 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-datetime-on-different-os/Python datetime on different os behaves inconsistentlyPython If on Numbershttps://datumorphism.leima.is/til/programming/python/python-if-condition-on-numbers/Mon, 31 Dec 2018 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-if-condition-on-numbers/If on int is dangerousPython Long Stringhttps://datumorphism.leima.is/til/programming/python/python-long-string/Mon, 31 Dec 2018 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-long-string/Python long string formattingPython Reliable Path to Filehttps://datumorphism.leima.is/til/programming/python/python-reliable-path/Mon, 31 Dec 2018 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-reliable-path/Find the actual path to fileVSCode on Mac Long Press Keys Not Repeatinghttps://datumorphism.leima.is/til/misc/vscode/vscode-on-mac-do-not-repeat/Mon, 31 Dec 2018 00:00:00 +0000https://datumorphism.leima.is/til/misc/vscode/vscode-on-mac-do-not-repeat/Enable your key repeat in vscode on mac in the terminalControlled Experimentshttps://datumorphism.leima.is/til/statistics/controlled-experiments/Tue, 04 Dec 2018 00:00:00 +0000https://datumorphism.leima.is/til/statistics/controlled-experiments/The three levels of controlled experimentsBiPolar Sigmoidhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-bi-polar-sigmoid/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-bi-polar-sigmoid/A BiPolar sigmoid function is
$$ \sigma(x) = \frac{1-e^{-x}}{1+e^{-x}}. $$
Visualization Bipolar SigmoidConic Section Functionhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-conic-section-function/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-conic-section-function/ TODO
Expand this article. See references 1.
Dorffner1994 Dorffner G. UNIFIED FRAMEWORK FOR MLPs AND RBFNs: INTRODUCING CONIC SECTION FUNCTION NETWORKS. Cybern Syst. 1994;25: 511–554. doi:10.1080/01969729408902340  ↩︎ELUhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-elu/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-elu/Both ReLu and Leaky ReLu have discontinuous derivatives. ELU is smooth for first order derivative, i.e., ELU is class $C^1$.
$$ \begin{cases} x, & \text{if }x=0 \\ \exp(x) - 1, & \text{else.} \end{cases} $$
Visualizations ELU
Derivative of ELU
Code def elu(x, alpha): return torch.where(x > 0, x, torch.exp(x) -1) Full code to generate the data used in this article Full code to generate the data used in this article
from torch import nn import matplotlib.pyplot as plt import torch from typing import Union, Optional from pathlib import Path import json def visualize_activation( x: torch.Tensor, acti: torch.Hyperbolic Tanhhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-hyperbolic-tangent/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-hyperbolic-tangent/$$ \tanh(x) = \frac{\sinh(x)}{\cosh(x)} = \frac{e^{x} - e^{-x}}{e^x + e^{-x}} $$
Hyperbolic tangentLeaky ReLuhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-leaky-relu/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-leaky-relu/ReLu sets all negative regions to 0. Leaky ReLu sets the negative regions to a linear relation with slope $\alpha$,
$$ \begin{cases} x, & \text{if }x=0 \\ \alpha x, & \text{else.} \end{cases} $$
Visualizations Leaky ReLu with $\alpha=0.2$
Derivative of Leaky ReLu with $\alpha=0.2$. Notice that the derivative is $0.2$ for $x<0$.
Code def leaky_relu(x, alpha): return torch.where(x > 0, x, alpha * x) Full code to generate the data used in this article Full code to generate the data used in this article
from torch import nn import matplotlib.pyplot as plt import torch from typing import Union, Optional from pathlib import Path import json def visualize_activation( x: torch.Radial Basis Functionhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-radial-basis-function/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-radial-basis-function/ Hyperbolic tangentTwo unnormalized Gaussian radial basis functions in one input dimension. The basis function centers are located at x1=0.75 and x2=3.25. Source Unnormalized Radial Basis FunctionsReLuhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-relu/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-relu/Rectified Linear Unit (ReLu) is a very popular activation function in deep learning. ReLu is defined as
$$ \begin{cases} x, & \text{if }x=0 \\ 0, & \text{else.} \end{cases} $$
Visualizations ReLu
Derivative of ReLu
Characteristics In trained models, ReLu doesn’t preserve the qualitative distributions of values after the activation.
Lippe P. Tutorial 3: Activation Functions — UvA DL Notebooks v1.1 documentation. In: UvA Deep Learning Tutorials [Internet].
Because of the zero values in ReLu, many neurons actually don’t participate in any of the tasks as they are just nullified to zeros and provide no gradient.Swishhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-swish/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-swish/Swish is infinitely differentiable, i.e., class $C^\infty$.
$$ x \sigma(x), $$
where $\sigma$ is the [[uni-polar sigmoid]] Uni-Polar Sigmoid Uni-polar sigmoid function and its properties .
Visualizations ELU
Derivative of ELU
Code def swish(x, alpha): return x * torch.sigmoid(x) Full code to generate the data used in this article Full code to generate the data used in this article
from torch import nn import matplotlib.pyplot as plt import torch from typing import Union, Optional from pathlib import Path import json def visualize_activation( x: torch.Tensor, acti: torch.nn.Module, save_path: Optional[Union[str, Path]] = None ) -> dict: """Visualize activation function on the domain of x""" y = acti(x) # Calculate the grad of the activation function x = x.Uni-Polar Sigmoidhttps://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-uni-polar-sigmoid/Mon, 19 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/cards/machine-learning/neural-networks/activation-uni-polar-sigmoid/A uni-Polar sigmoid function is
$$ \sigma(x) = \frac{1}{1+e^{-x}}. $$
Visualization Uni-polar Sigmoid function Tricks A very useful trick: $$ 1 - \sigma(x) = \sigma(-x). $$Schaum's Outline of Theories and Problems of Elements of Statistics I and IIhttps://datumorphism.leima.is/reading/elements-of-statistics/Thu, 01 Nov 2018 00:00:00 +0000https://datumorphism.leima.is/reading/elements-of-statistics/The basics and all of modern statisticsPandas with MultiProcessinghttps://datumorphism.leima.is/til/programming/pandas/pandas-parallel-multiprocessing/Sun, 09 Sep 2018 00:00:00 +0000https://datumorphism.leima.is/til/programming/pandas/pandas-parallel-multiprocessing/Define number of processes, prs; Split dataframe into prs dataframes; Process each dataframe with one process; Merge processed dataframes into one. A piece of demo code is shown below.
from multiprocessing import Pool from multiprocessing.dummy import Pool as ThreadPool import pandas as pd # Create a dataframe to be processed df = pd.read_csv('somedata.csv').reset_index(drop=True) # Define a function to be applied to the dataframe def nice_func(name, age): return (name,age) # Apply to dataframe def apply_to_df(df_chunks): df_chunks['tupled'] = df_chunks.apply( lambda x: nice_func( x['host_name'], x['host_country']), axis=1 ) return df_chunks print('finished chunk') # Divide dataframe to chunks prs = 100 # define the number of processes chunk_size = int(df.Beer and Life Expectancyhttps://datumorphism.leima.is/blog/ruthless/beer-and-life-expectancy/Wed, 08 Aug 2018 00:00:00 +0000https://datumorphism.leima.is/blog/ruthless/beer-and-life-expectancy/This is a joke. Everything in this post is meant for fun. I moved to Germany a few weeks ago. One of the most interesting things here is beer. People are drinking so much, yet the life expectance of Germany is pretty high. So I came up with this broken joke “analysis” for fun.
Life expectancy vs beer consumption (L) per capita per year. Data obtained from wikipediaList of countries by life expectancy and List of countries by beer consumption per capita.
As you might be curious about the life expectancy vs the life expectancy, I also made one plot about it.Data Mining: Concepts and Techniqueshttps://datumorphism.leima.is/reading/data-mining/Wed, 01 Aug 2018 00:00:00 +0000https://datumorphism.leima.is/reading/data-mining/How data mining was done in the pastFitt's Lawhttps://datumorphism.leima.is/til/misc/fitts-law/Sun, 22 Jul 2018 00:00:00 +0000https://datumorphism.leima.is/til/misc/fitts-law/How fast can you move your mouse to targetCopy Scalars and Lists in Pythonhttps://datumorphism.leima.is/til/programming/python/python-copy-value-or-address/Tue, 03 Jul 2018 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-copy-value-or-address/Python copy values of scalars but addresses of listsCertificate Errors in urllibhttps://datumorphism.leima.is/til/data/python-urllib-ssl/Mon, 25 Jun 2018 00:00:00 +0000https://datumorphism.leima.is/til/data/python-urllib-ssl/Dealing with errors when scraping dataCalculated Columns in Pandashttps://datumorphism.leima.is/til/programming/pandas/pandas-new-column-from-other/Sun, 20 May 2018 00:00:00 +0000https://datumorphism.leima.is/til/programming/pandas/pandas-new-column-from-other/Create new columns in pandastree in Linuxhttps://datumorphism.leima.is/til/programming/trees/Tue, 20 Mar 2018 00:00:00 +0000https://datumorphism.leima.is/til/programming/trees/Trees in computer scienceHeap on Mac and Linuxhttps://datumorphism.leima.is/til/programming/cpp/cpp-heap-mac-linux-diff/Tue, 26 Sep 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/cpp/cpp-heap-mac-linux-diff/Some caveats about heap on mac and linuxC++ int Multiplicationhttps://datumorphism.leima.is/til/programming/cpp/cpp-int-multiply/Thu, 21 Sep 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/cpp/cpp-int-multiply/int multiplication in C++ should be processed with caution.CMake Usagehttps://datumorphism.leima.is/til/programming/cmake-usage/Thu, 21 Sep 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/cmake-usage/How to use CMake to generate makefilesAllocating Memory for Multidimensional Array in C++https://datumorphism.leima.is/til/programming/cpp/cpp-allocating-memory-multidimensional-array/Thu, 14 Sep 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/cpp/cpp-allocating-memory-multidimensional-array/Some caveatsC++ range-for-statementhttps://datumorphism.leima.is/til/programming/cpp/cpp-range-for-statement/Tue, 12 Sep 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/cpp/cpp-range-for-statement/In C++ we can use range-for-statementList All Folders in Linux or Machttps://datumorphism.leima.is/til/programming/linux-mac-list-all-folders/Tue, 01 Aug 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/linux-mac-list-all-folders/Using ls and tree commands to list folders onlyPython Default Parameters Tripped Me Uphttps://datumorphism.leima.is/til/programming/python/python-default-parameters-mutable/Sat, 03 Jun 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-default-parameters-mutable/Python default parameters might be changed with each runSome Tests on Matplotlib Backendshttps://datumorphism.leima.is/til/programming/matplotlib-backend/Tue, 23 May 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/matplotlib-backend/Matplotlib provides many different backendsMathematica Provides Great PlotTheme Optionshttps://datumorphism.leima.is/til/programming/mathematica/mathematica-plottheme/Fri, 19 May 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/mathematica/mathematica-plottheme/Amazingly, Mathematica provides an option for plot that automatically generates beautiful plots.Turn a Series Expansion into Function in Mathematicahttps://datumorphism.leima.is/til/programming/mathematica/mathematica-turn-series-into-function/Mon, 15 May 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/mathematica/mathematica-turn-series-into-function/Turn a series expansion in Mathematica into a functionOvercoming catastrophic forgetting in neural networkshttps://datumorphism.leima.is/reading/overcoming-catastrophic-forgetting-in-neural-networks/Sun, 14 May 2017 00:00:00 +0000https://datumorphism.leima.is/reading/overcoming-catastrophic-forgetting-in-neural-networks/Using a newly defined loss function the authors could implement an idea that achieves the multi-task within one network.Git Asks for Password Whenever I Pull or Pushhttps://datumorphism.leima.is/til/programming/git/git-ssh-asking-pwd-everytime/Thu, 11 May 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/git/git-ssh-asking-pwd-everytime/My git asks for password every time I pull or push even with ssh configured.Command Line Russian Roulettehttps://datumorphism.leima.is/til/programming/command-line-russian-roulette/Tue, 09 May 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/command-line-russian-roulette/Play russian roulette in your command lineGNU Screen Key Conflict with Bashhttps://datumorphism.leima.is/til/programming/gnu-screen-key-conflict-with-bash/Mon, 08 May 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/gnu-screen-key-conflict-with-bash/GNU screen key conflict with bash can be solvedHow to Run Mathematica Script in Terminalhttps://datumorphism.leima.is/til/programming/run-mathematica-script-in-terminal/Mon, 08 May 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/run-mathematica-script-in-terminal/Using math -run or wolfram -run we could execute a Mathematica script through ssh in terminal.GNUPLOT Inline Output in iterm2https://datumorphism.leima.is/til/programming/gnuplot-iterm2-imgcat/Fri, 07 Apr 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/gnuplot-iterm2-imgcat/Using gnuplot in iterm2 we can output result inside terminal combined with imgcatMathematica Exclude Singularities in Plothttps://datumorphism.leima.is/til/programming/mathematica/mathematica-plot-exclude-singularities/Wed, 22 Mar 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/mathematica/mathematica-plot-exclude-singularities/Mathematica Plot might include some non-existant lines sometimes, Exclusions is the potion for it.Passing Function Arguments Through Lists in Mathematicahttps://datumorphism.leima.is/til/programming/mathematica/mathematica-passing-arguments-through-lists/Mon, 20 Feb 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/mathematica/mathematica-passing-arguments-through-lists/We can pass a list of arguments using SequenceGit Pull with Submodulehttps://datumorphism.leima.is/til/programming/git/git-pull-with-submodule/Fri, 03 Feb 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/git/git-pull-with-submodule/Pull git repo with submodulePositioning textblock in LaTeX Beamerhttps://datumorphism.leima.is/til/programming/latex-beamer-textblock-position/Tue, 17 Jan 2017 00:00:00 +0000https://datumorphism.leima.is/til/programming/latex-beamer-textblock-position/Positioning textblock in LaTeX Beamer using textpos package and eso pic packageMathematica Different Output Formshttps://datumorphism.leima.is/til/programming/mathematica/mathematica-different-output-forms/Mon, 28 Nov 2016 00:00:00 +0000https://datumorphism.leima.is/til/programming/mathematica/mathematica-different-output-forms/Mathematica has many different output forms. Understanding them is extremely helpful when making plots.Git Branch Optionshttps://datumorphism.leima.is/til/programming/git/git-branch-details/Sun, 27 Nov 2016 00:00:00 +0000https://datumorphism.leima.is/til/programming/git/git-branch-details/Some useful options about git branchgit pull multi remotehttps://datumorphism.leima.is/til/programming/git/git-pull-multi-remote/Tue, 22 Nov 2016 00:00:00 +0000https://datumorphism.leima.is/til/programming/git/git-pull-multi-remote/working with multi remoteWorking Memory and Brain Waveshttps://datumorphism.leima.is/reading/working-memory-and-brain-waves/Sun, 20 Nov 2016 00:00:00 +0000https://datumorphism.leima.is/reading/working-memory-and-brain-waves/Working memory might be related to the background brain waves from theoretical point of viewPopularity versus similarity in growing networkshttps://datumorphism.leima.is/reading/popularity-vs-similarity/Sun, 06 Nov 2016 00:00:00 +0000https://datumorphism.leima.is/reading/popularity-vs-similarity/Introduce geometry into the manifold of complex networksFormatting Numbers in Pythonhttps://datumorphism.leima.is/til/programming/formating-numbers-python/Tue, 11 Oct 2016 00:00:00 +0000https://datumorphism.leima.is/til/programming/formating-numbers-python/Formatting numbers in python using formatSolving Equations Using Differential Transformation Methodhttps://datumorphism.leima.is/til/math/differential-transformation-method-solving-equations/Tue, 11 Oct 2016 00:00:00 +0000https://datumorphism.leima.is/til/math/differential-transformation-method-solving-equations/Differential transformation method can be used to solve differential equation even integro-differential equations.The Great Chrome Dev Toolhttps://datumorphism.leima.is/til/programming/chrome-dev-tool-usage/Wed, 28 Sep 2016 00:00:00 +0000https://datumorphism.leima.is/til/programming/chrome-dev-tool-usage/How to use the chrome dev tool wiselyStart a Simple Serverhttps://datumorphism.leima.is/til/programming/start-simple-server/Sat, 17 Sep 2016 00:00:00 +0000https://datumorphism.leima.is/til/programming/start-simple-server/With one line of python commandmatplotlib x y limit and aspect ratiohttps://datumorphism.leima.is/til/programming/matplotlib-x-y-limit-and-aspect-ratio/Thu, 21 Jul 2016 00:00:00 +0000https://datumorphism.leima.is/til/programming/matplotlib-x-y-limit-and-aspect-ratio/matplotlib x y limit and aspect ratioTOP Commandhttps://datumorphism.leima.is/til/programming/top/Thu, 21 Jul 2016 00:00:00 +0000https://datumorphism.leima.is/til/programming/top/Some tips about top commandAssigning Values to Multiple Variableshttps://datumorphism.leima.is/til/programming/python/python-assigning-values-to-multiple-variables/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-assigning-values-to-multiple-variables/Assigning Values to Multiple Variablesgitignore by file sizehttps://datumorphism.leima.is/til/programming/git/gitignore-by-file-size/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/git/gitignore-by-file-size/gitignore by file sizeHTML Animations Using CSS: AnimateCSShttps://datumorphism.leima.is/til/programming/html-animate-css/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/html-animate-css/HTML Animations Using CSS AnimateCSSImport in Pythonhttps://datumorphism.leima.is/til/programming/import-in-python/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/import-in-python/Import in PythonIPython or Jupyter Notebook Magicshttps://datumorphism.leima.is/til/programming/ipython-or-jupyter-notebook-magics/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/ipython-or-jupyter-notebook-magics/IPython or Jupyter Notebook MagicsLaTeX Automatically Adjust Figurehttps://datumorphism.leima.is/til/programming/latex-automatically-adjust-figure/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/latex-automatically-adjust-figure/LaTeX Automatically Adjust FigureMathematica Plot Default Font Style and Ticks Style: BaseStylehttps://datumorphism.leima.is/til/programming/mathematica/mathematica-plot-basestyle-default-font-style-and-ticks-style/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/mathematica/mathematica-plot-basestyle-default-font-style-and-ticks-style/Mathematica Plot Default Font Style and Ticks Style BaseStyleMathematica Smooth Plothttps://datumorphism.leima.is/til/programming/mathematica/mathematica-smooth-plot/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/mathematica/mathematica-smooth-plot/Mathematica Smooth PlotMigrating Wordpress to Statichttps://datumorphism.leima.is/til/programming/migrating-wordpress-to-static-site/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/migrating-wordpress-to-static-site/Migrating Wordpress to StaticOpen URL using python using webbrowser modulehttps://datumorphism.leima.is/til/programming/open-url-using-python-webbrowser-module/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/open-url-using-python-webbrowser-module/Open URL using python using webbrowser modulePython Code Stylehttps://datumorphism.leima.is/til/programming/python/python-code-style/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-code-style/Code Style of Python Guide.
PEP 20 – The Zen of Python
1. Beautiful is better than ugly. 2. Explicit is better than implicit. 3. Simple is better than complex. 4. Complex is better than complicated. 5. Flat is better than nested. 6. Sparse is better than dense. 7. Readability counts. 8. Special cases aren't special enough to break the rules. 9. Although practicality beats purity. 10. Errors should never pass silently. 11. Unless explicitly silenced. 12. In the face of ambiguity, refuse the temptation to guess. 13. There should be one-- and preferably only one --obvious way to do it.Python Creating Listshttps://datumorphism.leima.is/til/programming/python/python-creating-lists/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-creating-lists/Code Style of Python GuidePython enumertatehttps://datumorphism.leima.is/til/programming/python/python-enumerate/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-enumerate/Python enumertate functionPython List Comprehensionshttps://datumorphism.leima.is/til/programming/python/python-list-comprehensions/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-list-comprehensions/Python List ComprehensionsPython Making a Listhttps://datumorphism.leima.is/til/programming/python/python-making-a-list/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-making-a-list/Python Making a ListPython Map vs For in Pythonhttps://datumorphism.leima.is/til/programming/python/python-map-vs-for/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-map-vs-for/Python Map vs For in PythonPython Onliner: Filter Prime Numbershttps://datumorphism.leima.is/til/programming/filter-prime-numbers/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/filter-prime-numbers/Python Onliner Filter Prime NumbersPython Stupid numpy.piecewisehttps://datumorphism.leima.is/til/programming/python/python-stupid-numpy-piecewise/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-stupid-numpy-piecewise/Python Stupid numpy.piecewisePython Various Ways of Writing Loopshttps://datumorphism.leima.is/til/programming/python/python-writing-loops/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-writing-loops/Python Various Ways of Writing LoopsRun a program in the background on ubuntuhttps://datumorphism.leima.is/til/programming/run-program-in-background-ubuntu/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/run-program-in-background-ubuntu/Run a program in the background on ubuntusnakevizhttps://datumorphism.leima.is/til/programming/python/python-profile-snakeviz/Fri, 04 Dec 2015 00:00:00 +0000https://datumorphism.leima.is/til/programming/python/python-profile-snakeviz/Python snakevizArea Enclosed by a Linehttps://datumorphism.leima.is/til/math/area-enclosed-in-a-line/Sun, 15 Feb 2015 00:00:00 +0000https://datumorphism.leima.is/til/math/area-enclosed-in-a-line/Calculate the area enclosed by a lineEigensystem of A Special Matrixhttps://datumorphism.leima.is/til/math/eigensystem-of-a-special-matrix/Sun, 15 Feb 2015 00:00:00 +0000https://datumorphism.leima.is/til/math/eigensystem-of-a-special-matrix/Eigenstates of a very special matrixFeynman Trickhttps://datumorphism.leima.is/til/math/feynman-tricks/Sun, 15 Feb 2015 00:00:00 +0000https://datumorphism.leima.is/til/math/feynman-tricks/An identity about integralSymmetry of second derivativeshttps://datumorphism.leima.is/til/math/symmetry-of-second-derivatives/Sun, 15 Feb 2015 00:00:00 +0000https://datumorphism.leima.is/til/math/symmetry-of-second-derivatives/Symmetry of second derivatives<link>https://datumorphism.leima.is/wiki/dynamical-system/integration-of-ode/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://datumorphism.leima.is/wiki/dynamical-system/integration-of-ode/</guid><description/></item><item><title/><link>https://datumorphism.leima.is/wiki/survival-analysis/survival-probability/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://datumorphism.leima.is/wiki/survival-analysis/survival-probability/</guid><description/></item><item><title>Abouthttps://datumorphism.leima.is/about/Mon, 01 Jan 0001 00:00:00 +0000https://datumorphism.leima.is/about/Datumorphism is my notebook about machine learning, programming, data, statistics, and data visualization. Most of the topics are here because I need them in my work or personal projects.
To organize the notes more elegantly, I created a hugo theme called connectome to host them. The essence of the theme is to enable bi-directional connections between them.
This website is mostly designed to be a reference book for myself.Cheatsheetshttps://datumorphism.leima.is/awesome/cheatsheets/Mon, 01 Jan 0001 00:00:00 +0000https://datumorphism.leima.is/awesome/cheatsheets/ Supervised Learning k-Nearest Neighbors [Supervised Learning Classification ] : Linear Regression [Supervised Learning Regression ] : Lasso [Supervised Learning Regression Regularization ] : Ridge [Supervised Learning Regression Regularization ] : ElasticNet [Supervised Learning Regression Regularization ] : Unsupervised Learning k-Means [Unsupervised Learning ] : t-SNE [Unsupervised Learning ] : PCA [Unsupervised Learning Dimension Reduction Feature Selection ] : NMF [Unsupervised Learning ] : Non-negative Matrix FactoringGridlines in Matplotlibhttps://datumorphism.leima.is/til/programming/matplotlib-gridlines/Mon, 01 Jan 0001 00:00:00 +0000https://datumorphism.leima.is/til/programming/matplotlib-gridlines/Adding gridlines in matplotlibResearchershttps://datumorphism.leima.is/awesome/researchers/Mon, 01 Jan 0001 00:00:00 +0000https://datumorphism.leima.is/awesome/researchers/ Machine Learning Geoffrey Hinton [machine learning psychology artificial intelligence cognitive science computer science ] : Emeritus Prof. Comp Sci, U.Toronto & Engineering Fellow, Google Max Welling [machine learning graph ] : Professor Machine Learning, University of Amsterdam William L Hamilton [machine learning graph ] : Assistant Professor of Computer Science, McGill University and Mila Yann LeCun [machine learning artificial intelligence ] : Chief AI Scientist at Facebook & Silver Professor at the Courant Institute,Toolshttps://datumorphism.leima.is/awesome/tools/Mon, 01 Jan 0001 00:00:00 +0000https://datumorphism.leima.is/awesome/tools/List of Tools Dashboard streamlit [Python ] : Build dashboards, fast, in python plotly dash [Python ] : Build complex dashboards in python ReDash [Python ] : Superset [Python ] : Metabase [Java ] : Google Data Studio [Free Google BigQuery Cloud ] : Google Datastudio is a convinent tool to produce simple yet massive dashboards for the team. Design and Build a Data Warehouse for Business [Courses Warehouse Business ] : Explained Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) [LSTM RNN ] : JavaScript replacements for Python data science tools [JavaScript Tools Data Science ] : https://github.Typography of this Websitehttps://datumorphism.leima.is/typography/Mon, 01 Jan 0001 00:00:00 +0000https://datumorphism.leima.is/typography/Basic Syntax This website uses kramdown as the basic syntax. However, a lot of html/css/js has been applied to generate some certain contents or styles.
Math also follows the kramdown syntax.
Notes div {% highlight html %}
Figure with Caption {% highlight html %}
![]({{ site.url }}/assets/programming/chrome-dev-tools-inspect.png) where {{ site.url }} is the configured url of the site.
Alternatively, we can use the set attributes syntax in kramdown.
{% highlight md %} This is a paragraph with some class. The class is specified in the end of the paragraph. {: .notes–warning} {% endhighlight %}
The results shows as a paragraph with the corresponding class.