Machine learning and other gibberish on Telegram; https://t.me/amneumarkt
Amazon has been updating their Machine Learning University website. It is getting more and more interesting. They have added an article about linear regression recently. There is a section in this article about interpreting linear models and it is just fun.
( Time machine: https://t.me/amneumarkt/293 )
I’ve never thought about dark mode in LaTeX. It sounds weird at first, but now thinking about this, it’s actually a great style.
This is a dark style from Dracula. https://draculatheme.com/latex
This is interesting.
Toy Models of Superposition. [cited 15 Sep 2022]. Available: https://transformer-circuits.pub/2022/toy_model/index.html#learning
Germany is so small. My GitHub profile ranks 102 in Germany by public contributions.
Tracking of an Eagle over a 20 year period. Source: https://twitter.com/Loca1ion/status/1566346534651924480?s=20&t=AKXn9U-L3fyhrJzeAXySlA
Some results from the stable difussion model. See comments for some examples.
Hmm not so many contributions from wild animals.
Data from this paper: https://www.pnas.org/doi/10.1073/pnas.1711842115#T1
I find this work counter intuitive. They took some descriptions of the optimization in machine learning and trained a transformer to “guesstimate” the hyperparameters of a model. I understand that human being has some “feeling” of the hyperparameters after working with the data and model for a while. But it is usually hard to extrapolate such knowledge when we have completely new data and models. I guess our brain is doing some statistics based on our historical experiments. And we call this intuition. My “intuition” is that there is little generalizable knowledge in this problem. 🙈 It would have been so great if they investigated the saliency maps.
I became a beta tester of DALLE. Played with it for a while and it is quite fun. See the comments for some examples. Comment if you would like to test some prompts.
participants who spent more than six hours working on a tedious and mentally taxing assignment had higher levels of glutamate — an important signalling molecule in the brain. Too much glutamate can disrupt brain function, and a rest period could allow the brain to restore proper regulation of the molecule
the Library of Statistical Techniques (LOST)!
Hmmmm, RStudio is doing some weird stuff recently.
Fotios Petropoulos initiated the forecasting encyclopaedia project. They published this paper recently.
Petropoulos, Fotios, Daniele Apiletti, Vassilios Assimakopoulos, Mohamed Zied Babai, Devon K. Barrow, Souhaib Ben Taieb, Christoph Bergmeir, et al. 2022. “Forecasting: Theory and Practice.” International Journal of Forecasting 38 (3): 705–871.
Also available here: https://forecasting-encyclopedia.com/
The paper covers many recent advances in forecasting, including deep learning models. There are some important topics missing but I’m sure they will cover them in future releases.
so the job of data scientist will only continue to grow in its importance in the business landscape.
However, it will also continue to change. We expect to see continued differentiation of responsibilities and roles that all once fell under the data scientist category.
Guidelines for research coding. It is not the highest standard but is easy to follow.
Kreuzberger D, Kühl N, Hirschl S. Machine Learning Operations (MLOps): Overview, definition, and architecture. arXiv [csLG]. 2022 [cited 17 Jul 2022]. doi:10.48550/ARXIV.2205.02302
The recommended readings serve as a good curriculum for transformers.
I was playing with dalle-mini ( https://github.com/borisdayma/dalle-mini ).
So… in the eyes of Dalle-mini,
- science == chemistry (? I guess),
- scientists are men.
Tried several times, same conclusions.
It is so hard to fight against the bias in ML models.
Update: OpenAI is fixing this.
[P] No, we don’t have to choose batch sizes as powers of 2: MachineLearning https://www.reddit.com/r/MachineLearning/comments/vs1wox/p_no_we_dont_have_to_choose_batch_sizes_as_powers/
Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L, Hutchinson B, et al. Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency. New York, NY, USA: ACM; 2019. doi:10.1145/3287560.3287596
This is also like one thousand years later…
PyMC 4.0 Release Announcement — PyMC project website https://www.pymc.io/blog/v4_announcement.html
If you are building a simple dashboard using python, streamlit is a great tool to get started. One of the problems in the past was to create multipage apps.
To solve this problem, I created a template for multipage apps a year ago. https://github.com/emptymalei/streamlit-multipage-template
But today, streamlit officially introduced multipage support. And it looks great. I haven’t built any dashboards for a while, but to me, this is still the go-to solution for a dashboard. https://blog.streamlit.io/introducing-multipage-apps/
Higharc is a start-up helping people design houses using generative designs.
The demo looks amazing.
This is hilarious.
I have heard about deepeta before but never thought it was a transformer.
According to this blog post by uber, they are using an encoder decoder architecture with linear attention.
This blog post also explains how they made a transformer fast.
DeepETA: How Uber Predicts Arrival Times Using Deep Learning https://eng.uber.com/deepeta-how-uber-predicts-arrival-times/
Parsimony with cognitive resource limitations 🤔
I have been following an issue on math support for github markdown (github/markup/issues/274).
One thousand years later …
Math support in Markdown | The GitHub Blog https://github.blog/2022-05-19-math-support-in-markdown/
Quote from this article:
“It doesn’t transmit from person to person as readily, and because it is related to the smallpox virus, there are already treatments and vaccines on hand for curbing its spread. So while scientists are concerned, because any new viral behaviour is worrying — they are not panicked.”
Finally… We can now utilize the real power of M1 chips.
Introducing Accelerated PyTorch Training on Mac | PyTorch https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/
I have been following this issue: https://github.com/pytorch/pytorch/issues/47702#issuecomment-1130162835 There were even some fights. 😂
This post is a retro on how I learned Python.
Disclaimer: I can not claim that I am a master of Python. This post is a retrospective of how I learned Python in different stages.
I started using Python back in 2012. Before this, I was mostly a Matlab/C user.
Python is easy to get started, yet it is hard to master. People coming from other languages can easily make it work but will write some “disgusting” python code. And this is because Python people talk about “pythonic” all the time. Instead of being an actual style guide, it is rather a philosophy of styles.
When we get started, we are most likely not interested in PEP8 and PEP257. Instead, we focus on making things work. After some lectures from the university (or whatever sources), we started to get some sense of styles. Following these lectures, people will probably write code and use Python in some projects. Then we began to realize that Python is strange, sometimes even doesn’t make sense. Then we started leaning about the philosophy behind it. At some point, we will get some peer reviews and probably fight against each other on some philosophies we accumulated throughout the years.
The attached drawing (in comments) somehow captures this path that I went through. It is not a monotonic path of any sort. This path is most likely to be permutation invariant and cyclic. But the bottom line is that mastering Python requires a lot of struggle, fights, and relearning. And one of the most effective methods is peer review, just as in any other learning task in our life.
Peer review makes us think, and it is very important to find some good reviewers. Don’t just stay in a silo and admire our own code. To me, the whole journey helped me building one of the most important philosophies of my life: embrace open source and collaborate.
Could use this
How to Lie with Statistics - Wikipedia https://en.wikipedia.org/wiki/How_to_Lie_with_Statistics
Stop squandering data: make units of measurement machine-readable https://www.nature.com/articles/d41586-022-01233-w
Highly recommended! If you are working on deep learning for forecasting, gluonts is a great package. It simplifies all these tedious data preprocessing, slicing, backrest stuff. We can then spend time on implementing the models themselves (there’re a lot of ready-to-use models). What’s even better, we can use pytorch lightning!
See this repository for a list of transformer based forecasting models. https://github.com/kashif/pytorch-transformer-ts
Came across this post this morning. I realized the reason I am not writing a lot in Julia is simply because I don’t know how to write quality code in Julia.
When we build a model in Python, we know all these details about making it quality code. For a new language, I’m just terrified by the amount of details I need to be aware of.
Ah I’m getting older.
JAX vs Julia (vs PyTorch) · Patrick Kidger https://kidger.site/thoughts/jax-vs-julia/
Anaconda open sourced this…
I have no idea what this is for…
I heard about information bottleneck so many times but didn’t really go back and read the original papers.
I spent some time on it and I found it quite interesting. It is philosophically based on what was described in Vapnik’s The Nature of Statistical Learning, where he discussed how generalizations work by enforcing parsimony. Here in this information bottleneck paper, the most interesting thing is the quantified generalization gap and complexity gap. With these, we know where to go on the information plane.
It’s a good read.
Tishby N, Zaslavsky N. Deep Learning and the Information Bottleneck Principle. arXiv [cs.LG]. 2015. Available: http://arxiv.org/abs/1503.02406,
Hmm… Interesting pattern
I realized something interesting about time management.
If I open my calendar now, I see these “tiles” of meetings filling up most of my working hours. It looks bad, but it was even worse in the past. The thing is, if I do meetings during my working hours, I will have to work extra hours to do some thinking and analysis. It is rather cruel.
So what changed? I think I realized the power of Google Docs. Instead of many people talking and nobody listening, someone should write up a draft first and send it out to the colleagues. Then, once people get the link to the docs, everyone can add comments.
This doesn’t seem to be very different from meetings. Oh, it is very different. The workflow can be async. We are not forced to use our precious focus time to attend meetings. We can read and comment on the document whenever we like: when we are commuting, when we are taking a dump, when we are on a phone/tablet, just, any, time.
Apart from the async workflow, I also like the “think, comment and forget” idea. I feel people deliver better ideas when we think first, comment next, and forget about it unless there are replies to our comments. No pressure, no useless debates.
I read about conformal prediction a while ago and realized that I need to understand more about the hypothesis testing theories. As someone from natural science, I mostly work within the Neyman-Pearson ideas. So I explored it a bit and found two nice papers. See the list below. If you have other papers on similar topics, I would appreciate some comments.
- Perezgonzalez JD. Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Front Psychol. 2015;6: 223. doi:10.3389/fpsyg.2015.00223 https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00223/full
- Lehmann EL. The Fisher, Neyman-Pearson Theories of Testing Hypotheses: One Theory or Two? J Am Stat Assoc. 1993;88: 1242–1249. doi:10.2307/2291263
I vaguely feel there’s a talent shortage in Germany. “Hiring is hard”. I heard this several times. Our team also need more hires.
So the company came up with this: Land a job at Zalando within 3 days after the final interviews!
Every chemistry graduate will be in charge of a molecule. Someone got to take care of “Titin” (189,819 characters), and she/he will have to recite the name first in every meeting: https://en.wiktionary.org/wiki/Appendix:Protologisms/Long_words/Titin#Noun
I have, somehow, 5 different brands of smart home products in our little apartment. I have no idea what is going on in the smart home industry. Every brand has its own app, hub, or even protocal. So I had to install five different apps to initialize the devices. I could, in principle, ditch these apps and use google/alexa only after I installed them, however, this is still extremely inconvenient as google/alexa doesn’t support all the fancy functions of the devices.
Any solutions to this problem?
Not bad. 😂
The Big Data Game | Firebolt https://www.firebolt.io/big-data-game
The Dunning-Kruger effect is quite real 😂
Infographic: 50 Cognitive Biases in the Modern World https://www.visualcapitalist.com/50-cognitive-biases-in-the-modern-world/
Plot Overview for Matplotlib Users / Observable / Observable https://observablehq.com/@observablehq/plot-overview-for-matplotlib-users
Interesting… There’re some discussions on the lottery ticket hypothesis.
Beautiful and systematic derivation showing how and why negative sampling works
Negative sampling is a great technique to estimate the softmax especially when the calculation of the partition function is intractable. It’s used in word2vec, and many other models such as node2vec.
Goldberg Y, Levy O. word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv [cs.CL]. 2014. Available: http://arxiv.org/abs/1402.3722
I drafted a new release of the Hugo Connectome theme.
I like the command palette in VSCode. It is fast and accurate. So I added a command palette to the Hugo Connectome theme to help us navigate the notes and links.
Now we can use the command palette to navigate to backlinks, out links, references, and more.
See it in action: https://datumorphism.leima.is/wiki/time-series/state-space-models/ Use Command+K or Windows+K to activate the command palette.
- Type in search to search for notes.
- Type in Note ID to copy the current note id to the clipboard.
- Type in graph to see the graph view of all the notes.
- Type in references to go to references.
- Type in backlinks to select from backlinks to navigate to.
- Type in links to select from all outgoing links to navigate to.
(WARNING: Promoting of my notes. This is a test.)
I learned something very interesting today: CRPS.
Suppose we would like to approximate the quantile function of some data points. If we assume a parametric model of the quantile function, e.g., Q(x|theta), how do we find the parameters using the given dataset? Naturally, we need a loss function to compare our quantile function to the datapoints. CRPS is a robust choice. I have seen it being used in several papers in time series forecasting.
You can find more details here: https://datumorphism.leima.is/cards/time-series/crps/
It’s a lengthy article but also a well written one.
A few comments:
- The author wrote a paper on “The Next Decade in AI”: https://arxiv.org/abs/2002.06177
- Make things work in their own domain. If we are gonna come up with a “theory of everything” for computing or intelligence, we will hit the “mesoscopic” wall, where the bottom up theories and the top down approaches meet but we can’t really make a connection. In the case of intelligence, the wall is determined by the complexities (maybe MDL?). You can make symbols work for high complexities but not always. Similar thing happens to neural networks.
- The neural symbolic approach sounds good but it’s almost like patching a bike as wheels of a train.
Please click on the link and watch the animation. It’s 3D.
“The clever people at @NASA have created this deceptively simple yet highly effective data visualisation showing monthly global temperatures between 1880-2021”.: nextfuckinglevel https://www.reddit.com/r/nextfuckinglevel/comments/tejc0l/the_clever_people_at_nasa_have_created_this/?utm_source=share&utm_medium=ios_app&utm_name=iossmf
I share similar thoughts with the top comment by theXYZT.
If I may add to her comment, I would say: Embrace the new approach even if it shatters our philosophy. But it’s not only about what happened in the history of physics. It’s about what we believe in science. In some sense, the purpose of interpretability and parsimony is for human to come up with better ideas and making us happy. If a universal model is working well enough and can be improved gradually already, interpretability is not as important as predictability. This is more or less the first principle of science, if I may say so.
I find poetry a great tool to manage Python requirements.
I used to manage Python requirements using requirements.txt(environment.yaml) and install them using pip(conda). The thing is, in this stack, we have to pin the version ranges manually. It is quite tedious, and we easily run into version problems for a large project.
Poetry is the savior here. When developing a package, we add some initial dependencies to the pyproject.yaml, a PEP standard. Whenever a new package is needed, we run poetry add package-name. Poetry tries to figure out the compatible versions. A lock file for the dependencies with restricted versions will be created or updated. To recreate an identical python environment, we only need to run poetry install.
There’s one drawback and may be quite painful at some point. Recreating the lock file for dependencies is extremely slow when the complexity grows in the requirements. But this is not a problem if poetry but rather constraints from pypi. One solution to this problem is to use cache.
I have been using Hugo for my public notes. I built a theme called connectome a while ago. This theme has been serving as my note-taking theme.
When building my notes website on data science, I have noticed many problems with the connectome theme. And today, I fixed most of the problems. The connectome theme deserves some visibility now.
If you are using Hugo and would like to build a website for connected notes, like this one I have https://datumorphism.leima.is/ , the Hugo connectome theme can help a bit.
The Connectome Theme: https://github.com/kausalflow/connectome A template one could use to bootstrap a new website: https://github.com/kausalflow/hugo-connectome-theme-demo Tutorials: https://hugo-connectome.kausalflow.com/projects/tutorials/ Real-world example: https://datumorphism.leima.is/
— If you would like to know more about how it was done, the idea is quite simple. Before we move on, one FAQ I got is, why Hugo. The answer is simple, speed.
The key components of the connectome theme are:
- automated backlinks, and
- a graph visualization of the whole notebook.
Behind the scene, the heart of the theme is a metadata file that describes the connections between the notes.
For each note, we use the metadata to get all the notes that links to the current note, and build backlinks based on the metadata.
#ML #RL #DeepMind
Magnetic control of tokamak plasmas through deep reinforcement learning | Nature https://www.nature.com/articles/s41586-021-04301-9
I made some slides to bootstrap a community in my company to share papers on graph related methods (spectral, graph neural networks, etc). These slides are mostly based on the first two chapters of the book by William Hamilton. I added some intuitive interpretations on some key ideas. Some of these are frequently used in graph neural networks even transformers. Building intuitions helps us unboxing these neural networks. But the slides are only skeleton notes so I probably have to expand them at some point.
I am thinking about drawing more about the book and on this topic. Maybe even making some short videos using these slides. Let’s see how far I can go. I am way too busy now. (<-no excuse)
Lol, DeepMind and OpenAI:
确实有很多，比如我用 ack 替代了 grep，速度快了不少。
Seaborn is getting a new interface.
Would be great if the author defines a dunder method _ _ add _ _ () instead of using .add() method. Using dunder add, we can simply use + on layers.
Nevertheless, we can all move away from plotnine when the migration is done.
Deepnote supports Great Expectations (GE) now.
I ran their template notebook:
Beautiful, elegant, and informative. It reminds me of the Netflix movie chromatic storytelling visualization.
Full image: https://zenodo.org/record/5828349
I thought it was a trivial talk in the beginning. But I quickly realized that I may know every each piece of the code mentioned in the video but the philosophy is what makes it exciting.
He talked about some fundamental ideas of Python, e.g., protocols.
After watching this video, an idea came to me. Pytorch lightning has implanted a lot of hooks in a very pythonic way. This is what makes pytorch lightning easy to use. (So if you do a lot of machine learning experiments, pytorch lightning is worth a try.)
Disclaimer: I’m no expert in state diagram nor statecharts.
It might be something trivial but I find this useful: Combined with some techniques in statecharts (something frontend people like a lot), state diagram is a great way to document what our data is going through in data (pre)processing.
For complicated data transformations, we can make the corresponding state diagram and follow your code to make sure it is working as expected. The only thing is that we are focusing on the state of data not any other system.
We can use some techniques from statecharts, such as hierarchies and parallels.
State diagram is better than flowchart in this scenario because we are more interested in the different states of the data. State diagrams automatically highlights the states and we can easily spot the relevant part in the diagram and we don’t have to start from the beginning.
I documented some data transformations using state diagrams already. I haven’t tired but it might also help us document our ML models.
Pu X, Kay M. A probabilistic grammar of graphics. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: ACM; 2020. doi:10.1145/3313831.3376466 Available at: https://dl.acm.org/doi/10.1145/3313831.3376466
A very good read if you are visualizing probability densities a lot. The paper began with a common mistake people make when visualizing densities. Then they proposed a systematic grammar of graphics for probabilities. They also provide a package (quite preliminary, see here https://github.com/MUCollective/pgog ).
I remember several years ago when I was still doing my PhD, there’s this contest about predicting protein structure and none of them was working well. At that time, I would never have thought we could have anything like AlphaFold in a few years. .
Alammar J. The Illustrated Transformer. [cited 14 Dec 2021]. Available: http://jalammar.github.io/illustrated-transformer/
A new lightweight language for data analysis and visualization. It looks promising.
I hate jupyter notebooks and I don’t use them on most of my projects. One of the reasons is low reproducibility due to its non-reative nature. You changed some old cells and forgot to run a cell below, you may read wrong results. This new language is reactive. If old cells are changed, related results are also updated.
How to Train your Decision-Making AIs https://thegradient.pub/how-to-train-your-decision-making-ais/
The author reviewed “five types of human guidance to train AIs: evaluation, preference, goals, attention, and demonstrations without action labels”.
The last one reminds me of the movie Finch. In the movie, Finch was teaching the robot to walk by demonstrating walking but without “labels”.
Hmmm my plate is way off the planetary heath diet recommendation.
Just in case you are also struggling with Python packages on Apple M1 Macs
I am using the third option: anaconda + miniforge.
An interactive Visual Vocabulary:
SHAP (SHapley Additive exPlanations) is a system of methods to interpret machine learning models. The author of SHAP built an easy-to-use package to help us understand how the features are contributing to the machine learning model predictions. The package comes with a comprehensive tutorial for different machine learning frameworks.
- Python Package: slundberg/shap
- A tutorial on how to use it: https://www.aidancooper.co.uk/a-non-technical-guide-to-interpreting-shap-analyses/
The package is so popular and you might be using it already. So what is SHAP exactly? It is a series of methods based on Shapley values.
SHAP (SHapley Additive exPlanations) is a game-theoretic approach to explain the output of any machine learning model.
Regarding Shapley value: There are two key ideas in calculating a Shapley value.
- A method to measure the contribution to the final prediction of some certain combination of features.
- A method to combine these “contributions” into a score.
SHAP provides some methods to estimate Shapley values and also for different models.
The following two pages explain Shapley value and SHAP thoroughly.
- Lundberg SM, Lee SI. A unified approach to interpreting model predictions. of the 31st international conference on neural …. 2017. Available: http://papers.nips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
- Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering. 2018;2: 749–760. doi:10.1038/s41551-018-0304-0
I posted a similar article years ago in our Chinese data weekly newsletter but for a different story.
Nicolas P. Rougier released his book on scientific visualization. He made some aesthetically pleasing figures. And the book is free.
Okay, I’ll tell you the reason I wrote this post. It is because xkcd made this.
Choosing proper colormaps for our visualizations is important. It’s almost like shooting a photo using your phone. Some phones capture details in every corner, while some phones give us overexposed photos and we get no details in the bright regions.
A proper colormap should make sure we see the details we need to see. To address the importance of colormaps, we use the two examples shown on the website of colorcet1. The two colormaps, hot, and fire, can be found in matplotlib and colorcet, respectively.
I can not post multiple images in one message, please see the full post for the comparisons of the two colormaps. Really, it is amazing. Find the link below: https://github.com/kausalflow/community/discussions/20
It is clear that “hot” brings in some overexposure. The other colormap, “fire”, is a so-called perceptually uniform colormap. More experiments are performed in colorcet. Glasbey et al showed some examples of inspecting different properties using different colormaps2.
References and links mentioned in this post:
Glasbey C, van der Heijden G, Toh VFK, Gray A. Colour displays for categorical images. Color Research & Application. 2007. pp. 304–309. doi:10.1002/col.20327 ↩︎
animegan v2! (I stole this animation from reddit. https://www.reddit.com/r/MachineLearning/comments/qo4kp8/r_p_animeganv2_face_portrait_v2/ )
Try it out:
- Telegram bot (works pretty well): https://t.me/face2stickerbot
- Dashboard (sometimes it doesn’t work): https://huggingface.co/spaces/akhaliq/AnimeGANv2
Redditors made some funny photos too. https://www.reddit.com/r/MachineLearning/comments/qo4kp8/r_p_animeganv2_face_portrait_v2/
— This post is also available here: https://community.kausalflow.com/c/ml-applications/animeganv2
- Wang X, Kondratyuk D, Christiansen E, Kitani KM, Alon Y, Eban E. Wisdom of Committees: An Overlooked Approach To Faster and More Accurate Models. arXiv [cs.CV]. 2020. Available: http://arxiv.org/abs/2012.01988
Most companies probably have several models to solve the same problem. There are model A, model B, even model C. The final result is some kind of aggregation of the three models. Or the models are cascaded like what’s shown in the figure. But it takes a lot of computing resources to run the features through the three models.
Wang et al shows that ensembles are not more resource demanding than big models with similar performance in CV tasks.
Lol, thank you Mr Lossfunction. But, which sanitizer are you using?
This is a post about Zillow’s Zetimate Model.
Zillow (https://zillow.com/ ) is an online real-estate marketplace and it is a big player. But last week, Zillow withdrew from the house flipping market and planned to layoff a handful of employees.
There are rumors indicating that this action is related to their machine learning based price estimation tool, Zestimate ( https://www.zillow.com/z/zestimate/ ).
At a first glance, Zestimate seems fine. Though the metrics shown on the website may not be that convincing, I am sure they’ve benchmarked more metrics than those shown on the website. There are some discussions on reddit.
Anyways, this is not the best story for data scientists.
(See also https://bit.ly/3F1Kv2F )
Centered Kernel Alignment (CKA) is a similarity metric designed to measure the similarity of between representations of features in neural networks1.
CKA is based on the Hilbert-Schmidt Independence Criterion (HSIC). HSIC is defined using the centered kernels of the features to compare2. But HSIC is not invariant to isotropic scaling which is required for a similarity metric of representations1. CKA is a normalization of HSIC.
The attached figure shows why CKA makes sense.
CKA has problems too. Seita et al argues that CKA is a metric based on intuitive tests, i.e., calculate cases that we believe that should be similar and check if the CKA values is consistent with this intuition. Seita et al built a quantitive benchmark3.
Microsoft created two depositories for Machine Learning and Data Science beginners. They created many sketches. I love this style.
( I am experimenting with a new platform. This post is also available at: https://community.kausalflow.com/c/ml-journal-club/probably-approximately-correct-pac-learning-and-bayesian-view )
The first time I read about PAC was in the book The Nature of Statistical Learning Theory by Vapnik 1.
PAC is a systematic theory on why learning from data is even feasible 2. The idea is to quantify the errors when learning from data and we find that is is possible to have infinitesimal error under some certain codnitions, e.g., large datasets. Quote from Guedj 3:
A PAC inequality states that with an arbitrarily high probability (hence “probably”), the performance (as provided by a loss function) of a learning algorithm is upper-bounded by a term decaying to an optimal value as more data is collected (hence “approximately correct”).
Bayesian learning is an very important topic in machine learning. We implement Bayesian rule in the components of learning, e.g., postierior in loss function. There also exists a PAC theory for Bayesian learning that explains why Bayesian algorithms works. Guedj wrote a primer on this topic3.
Vladimir N. Vapnik. The Nature of Statistical Learning Theory. 2000. doi:10.1007/978-1-4757-3264-1 ↩︎
Valiant LG. A theory of the learnable. Commun ACM. 1984;27: 1134–1142. doi:10.1145/1968.1972 ↩︎
(I am experimenting with a new platform. This post is also available at: https://community.kausalflow.com/c/ml-journal-club/how-do-neural-network-generalize )
There are somethings that are quite hard to understand in deep neural networks. One of them is how the network generalizes.
[Zhang2016] shows some experiments about the amazing ability of neural networks to learn even completely random datasets. But they can not generalize as the data is random. How to understand generalization? The authors mentioned some theories like VC dimension, Rademacher complexity, and uniform stability. But none of them is good enough.
Recently, I found the work of Simon et al [Simon2021]. The authors also wrote a blog about this paper [Simon2021Blog].
The idea is to simplify the problem of generalization by looking at how a neural network approximates a function f. This is approximate vectors in Hilbert space. Thus we are looking at the similarity of the vectors f, and its neural network approximation f'. The similarity of these two vectors is related to the eigenvalues of the so-called “neural tangent kernel” (NTK). Using NTK, they derived an amazingly simple quantity, learnability, which can measure how Hilbert space vectors align with each other, that is, how good the approximation using the neural network is.
[Zhang2016]: Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning requires rethinking generalization. arXiv [cs.LG]. 2016. Available: http://arxiv.org/abs/1611.03530
[Simon2021Blog]: Simon J. A First-Principles Theory of NeuralNetwork Generalization. In: The Berkeley Artificial Intelligence Research Blog [Internet]. [cited 26 Oct 2021]. Available: https://bair.berkeley.edu/blog/2021/10/25/eigenlearning/
[Simon2021]: Simon JB, Dickens M, DeWeese MR. Neural Tangent Kernel Eigenvalues Accurately Predict Generalization. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2110.03922
“Fail” When visualizing data, the units being used have to be specified for any values shown.
But the style of the charts is attractive. :)
By chungischef Available at: https://www.reddit.com/r/dataisbeautiful/comments/q958if/recreation_of_a_classic_population_density_map/
Duan T, Avati A, Ding DY, Thai KK, Basu S, Ng AY, et al. NGBoost: Natural Gradient Boosting for probabilistic prediction. arXiv [cs.LG]. 2019. Available: http://arxiv.org/abs/1910.03225
(I had it on my reading list for a long time. However, I didn’t read it until today because the title and abstract are not attractive at all.) But this is a good paper. It goes deep to dig out the fundamental reasons why some methods work and others don’t.
When inferring probability distributions, it is straightforward to come up with methods with parametrized distributions (statistical manifolds). Then, by tuning the parameters, we adjust the distribution to fit our dataset the best. The problem is the choice of the objective function and optimization methods. This paper mentioned a most generic objective function and a framework to optimize the model along the natural gradient instead of just the gradient w.r.t. the parameters. Different parametrizations of the objective is like coordinate transformations and chain rule only works if the transformations are in a “flat” space but such “flat” space is not necessarily a good choice for a high dimensional problem. For a space that is approximately flat in a small region, we can define distance like what we do in differential geometry1. Meanwhile, just like “covariant derivatives” in differential geometry, some kind of covariant derivative can be found on statistical manifolds and they are called “natural derivatives”. Descending in the direction of natural derivatives is navigating the landscape more efficiently.
This a Riemannian space ↩︎
#visualization #art #fun
More like a blog post… But the visualisation is cool. I posted it as a comment.
[2109.15079] Asimov’s Foundation – turning a data story into an NFT artwork https://arxiv.org/abs/2109.15079
Announcing Streamlit 1.0! 🎈
This is not only Julia for biologists. It is for everyone who is not using Julia.
Roesch, Elisabeth, Joe G. Greener, Adam L. MacLean, Huda Nassar, Christopher Rackauckas, Timothy E. Holy, and Michael P. H. Stumpf. 2021. “Julia for Biologists.” ArXiv [q-Bio.QM]. arXiv. http://arxiv.org/abs/2109.09973.
I like this. I was testing visualization using antv’s G6. It is not for data analysis as it is quite tedious to generate visualizations.
Observable’s plot is a much easier fluent package for data analysis.
Neural Networks visualized in 3D
Comment: Same for many competitive careers
Beware survivorship bias in advice on science careers https://www.nature.com/articles/d41586-021-02634-z
scikit learn reached 1.0. Nothing exciting about these new stuff but the major release probably means something.
Release Highlights for scikit-learn 1.0 — scikit-learn 1.0 documentation http://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_0_0.html
I read about the story of using tensorflow in google translate 1.
… Google Translate. Originally, the code that handled translation was a weighty 500,000 lines of code. The new, TensorFlow-based system has approximately 500, and it performs better than the old method.
This is crazy. Think about the maintenance of the code. A single person easily maintains 500 lines of code. 500,000 lines? No way.
Pointer I. Programming PyTorch for Deep Learning: Creating and Deploying Deep Learning Applications. O’Reilly Media; 2019. ↩︎
Phys. Rev. X 11, 031059 (2021) - Statistical Mechanics of Deep Linear Neural Networks: The Backpropagating Kernel Renormalization https://journals.aps.org/prx/abstract/10.1103/PhysRevX.11.031059
The Doomsday Datavisualizations - Bulletin of the Atomic Scientists
A Gentle Introduction to Graph Neural Networks https://distill.pub/2021/gnn-intro
The authors investigate the geometry formed by the responses of neurons for certain stimulations (tunning curve). Using stimulation as the hidden variable, we can construct a geometry of neuron responses. The authors clarified the relations between this geometry and other measurements such as mutual information.
The story itself in this paper may not be interesting to machine learning practitioners. But the method of using the geometry of neuron responses to probe the brain is intriguing. We may borrow this method to help us with the internal mechanism of neural networks.
Kriegeskorte, Nikolaus, and Xue-Xin Wei. 2021. “Neural Tuning and Representational Geometry.” Nature Reviews. Neuroscience, September. https://doi.org/10.1038/s41583-021-00502-3.
#ML #self-supervised #representation
Contrastive loss is widely used in representation learning. However, the mechanism behind it is not as straightforward as it seems.
Wang & Isola proposed a method to rewrite the contrastive loss in to alignment and uniformity. Samples in the feature space are normalized to unit vectors. These vectors are allocated onto a hypersphere. The two components of the contrastive loss are
- alignment, which forces the positive samples to be aligned on the hypersphere, and
- uniformity, which distributes the samples uniformly on the hypersphere.
By optimization of such objectives, the samples are distributed on a hypersphere, with similar samples clustered, i.e., pointing to the similar directions. Uniformity makes sure the samples are using the whole hypersphere so we don’t waste “space”.
Wang T, Isola P. Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2005.10242
Cute comics on interactive data visualization
Jetbrains released a new IDE for data scientist.
😂 Jürgen Schmidhuber invented transformers in the 90s.
This is cool.
Hullman J, Gelman A. Designing for interactive exploratory data analysis requires theories of graphical inference. Harvard Data Science Review. 2021. doi:10.1162/99608f92.3ab8a587 https://hdsr.mitpress.mit.edu/pub/w075glo6/release/2
Creating visualizations seems to be a creative task. At least for entry-level visualization tasks, we follow our hearts and build whatever is needed. However, visualizations are made for different purposes. Some visualizations are simply explorations and for us to get some feelings on the data. Some others are built for the validation of hypotheses. These are very different things.
Confirmation of an idea using charts is usually hard. In most cases, we need statistical tests to (dis)prove a hypothesis instead of just looking at the charts. Thus, visualizations become a tool to help us formulate a good question.
However, not everyone is using charts as hints only. Instead, many use charts to conclude. As a result, even experienced analysts draw spurious conclusions. These so-called insights are not going to be too solid.
The visual analysis seems to be an adversarial game between humans and the visualizations. There are many different models for this process. A crude and probably stupid model can be illustrated through an example of analysis by the histogram of a variable. The histogram looks like a bell. It is symmetric. It is centered at 10 with an FWHM of 2.6. I guess this is a Gaussian distribution with a mean 10 and sigma 1. This is the posterior p(model | chart). Imagine a curve like what was just guessed on top of the original curve. Would my guess and the actual curve overlap with each other? If not, what do we have to adjust? Do we need to introduce another parameter? Guess the parameter of the new distribution model and compare it with the actual curve again. The above process is very similar to a repetitive Bayesian inference. Though, the actual analysis may be much more complicated as the analysts would carrier a lot of …
Though not the core of the model, I noticed that this model (MEB) uses the user search behavior on Bing to build the language model. If a search result on Bing is clicked by the user, it is considered to be a positive sample for the query, otherwise a negative sample.
In self-supervised learning, it has been shown that negative sampling is extremely important. This Bing search dataset is naturally labeling the positive and negative samples. Kuhl idea.
Nielsen M. Reinventing discovery: The New Era of networked science. Princeton, NJ: Princeton University Press; 2011.
I found this book this morning and skimmed through it. It looks concise yet unique. The author discusses how the internet is changing the way human beings think as one collective intelligence. I like the chapters about how the data web is enabling more scientific discoveries.
challenges in data collection, verification, and serving tasks
Training GAN can be baffling. For example, the generator and the discriminator just don’t “learn” at the same scale sometimes. Would you try to balance the generator loss and discriminator loss by hand? Soumith Chintala ( @ FAIR ) put together this list of tips for training GAN. “Don’t balance loss via statistics” is one of the 17 tips by Chintala. The list is quite inspiring.
I have downloaded the file so you don’t need to.
This is an interesting report by anaconda. We can kind of confirm from this that Python is still the king of languages for data science. SQL is right following Python.
Quote from the report:
Between March 2020 to February 2021, the pandemic economic period, we saw 4.6 billion package downloads, a 48% increase from the previous year. We have no data for other languages so no predictions can be made but it is interesting to see Python growing so fast.
The roadblocks different data professionals facing are quite different. If the professional is a cloud engineer or mlops, then they do not mention that skills gap in the organization that many times. But for data scientists/analysts, skills gaps (e.g., data engineering, docker, k8s) is mentioned a lot. This might be related to the cases when the organization doesn’t even have cloud engineers/ops or mlops.
See the next message for the PDF file.
Julia Computing got a lot of investment recently. I need to dive deeper into the Julia Language.
PyData goes virtual this year.
I found a nice place to practice programming thinking. It is not as comprehensive as hackerrank/leetcode but these problems are quite fun.
Implicit Regularization in Tensor Factorization: Can Tensor Rank Shed Light on Generalization in Deep Learning? – Off the convex path http://www.offconvex.org/2021/07/08/imp-reg-tf/
In PyTorch, conversion from Torch tensors to numpy arrays is very fast on CPUs, though torch tensors and numpy arrays are very different things. This is because of the Python buffer protocol. The protocol makes it possible to use binary data directly from C without copying the object.
Reference: Eli Stevens Luca Antiga. Deep Learning with PyTorch: Build, Train, and Tune Neural Networks Using Python Tools. Simon and Schuster, 2020;
The distill team’s thought on interactive publishing and self-publishing in academia.
Great. Tensorflow implemented built-in decision forest models.
GitHub Copilot · Your AI pair programmer https://copilot.github.com/
This is crazy.
What is GitHub Copilot? GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. GitHub Copilot draws context from comments and code, and suggests individual lines and whole functions instantly. GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. The GitHub Copilot technical preview is available as a Visual Studio Code extension.
How good is GitHub Copilot? We recently benchmarked against a set of Python functions that have good test coverage in open source repos. We blanked out the function bodies and asked GitHub Copilot to fill them in. The model got this right 43% of the time on the first try, and 57% of the time when allowed 10 attempts. And it’s getting smarter all the time.
A Turing lecture article by the three famous DL guys. It’s an overview of the history, development, and future of AI. There are two very interesting points in the outlook section:
- “From homogeneous layers to groups of neurons that represent entities.” In biological brains, there are memory engrams and motifs that almost do this.
- “Multiple time scales of adaption.” This is another key idea that has been discussed numerous times. One of the craziest things about our brain is the diversity of time scales of plasticity, i.e., different mechanisms change the brain on different time scales.
Reference: Bengio Y, Lecun Y, Hinton G. Deep learning for AI. Commun ACM. 2021;64: 58–65. doi:10.1145/3448250 https://dl.acm.org/doi/10.1145/3448250
Geometric Deep Learning is an attempt to unify deep learning using geometry. Instead of building deep neural networks ignoring the symmetries in the data and leaving it to be discovered by the network, we apply the symmetries in the problem to the network. For example, instead of flattening the matrix of a cat image and have some predetermined order of the pixels, we apply a translational transformation on the 2D image and the cat should also be a cat without any doubt. This transformation can be enforced in the network.
BTW, If you come from a physics background, it is most likely that you have heard about the symmetries in physical theories like Noether’s theorem. In the history of physics, there was an era of many theories yet most of them are connected or even unified under the umbrella of geometry. Geometric deep learning is another “benevolent propaganda” based on a similar idea.
- Bronstein, Michael. “ICLR 2021 Keynote - ‘Geometric Deep Learning: The Erlangen Programme of ML’ - M Bronstein.” Video. YouTube, June 8, 2021. https://www.youtube.com/watch?v=w6Pw4MOzMuo.
- Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P. Geometric deep learning: going beyond Euclidean data. arXiv [cs.CV]. 2016. Available: http://arxiv.org/abs/1611.08097
- Bronstein MM, Bruna J, Cohen T, Veličković P. Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2104.13478
A library for interactive visualization directly from pandas.
The Bayesian hierarchical model provides a process to use Bayesian inference hierarchically to update the posteriors. What is a Bayesian model? In a Bayesian linear regression problem, we can take the posterior from the previous data points and use it as our new prior for inferring based on new data. In other words, as more data coming in, our belief is being updated. However, this is a problem if some clusters in the dataset have small sample sizes, aka small support. As we take these samples and fit them onto the model, we may get a huge credible interval. One simple idea to mitigate this problem is to introduce some constraints on how the priors can change. For example, we can introduce a hyperprior that is parametrized by new parameters. Then the model becomes hierarchical since we will also have to model the new parameters.
The referenced post, “Bayesian Hierarchical Modeling at Scale”, provides some examples of coding such models using numpyro with performance in mind.
Germany, birthplace of the automobile, just gave the green light to robotaxis
This paper serves as a good introduction to the declarative data analytics tools.
Declarative analytics performs data analysis using a declarative syntax instead of functions for specific algorithms. Using declarative syntax, one can “describe what you want the program to achieve rather than how to achieve it”. To be declarative, the declarative language has to be specific on the tasks. With this, we can only turn the knobs of some predefined model. To me, this is a deal-breaker.
Anyways, this paper is still a good read.
Makrynioti N, Vassalos V. Declarative Data Analytics: A Survey. IEEE Trans Knowl Data Eng. 2021;33: 2392–2411. doi:10.1109/TKDE.2019.2958084 http://dx.doi.org/10.1109/TKDE.2019.2958084
Hmmm, so they gave it a name. I’ve built so many projects using this approach. I started building such data repos using CI/CD services way before github actions was born. Of course github actions made it much easier. One of them is the EU covid data tracking project ( https://github.com/covid19-eu-zh/covid19-eu-data ). It’s been running for more than a year with very little maintenance. Some covid projects even copied our EU covid data tracking setup.
I actually built a system (https://dataherb.github.io) to pull such github actions based data scraping repos together.
An interesting talk:
We are pleased to have Anna Golubeva speak on “Are wider nets better given the same number of parameters?” on Wednesday May 19th at 12:00 ET.
You can find further details here and listen to the talk here.
We hope you can join!
“Don’t pull down the data. Do it with SQL.”
I believe this article is relevant. Most data scientists have very good academic records. These experiences of excellence compete with another required quality in the industry: The ability to survive in a less ideal yet competitive environment. We could be stubborn and find the environment that we fit well in or adapt based on the business playbook. Either way is good for us as long as we find the path that we love.
(I have a joke about this article: To reasoning productively, we do not need references for our claims at all.)
#DS #EDA #Visualization
If you are keen on data visualization, the new Observable Plot is something exciting for you. Observable Plot is based on d3 but it is easier to use in Observable Notebook. It also follows the guidelines of the layered grammar of graphics (e.g., marks, scales, transforms, facets.).
(This is an automated post by IFTTT.)
It is always good for a data scientist to understand more about data engineering. With some basic data engineering knowledge in mind, we can navigate through the blueprint of a fully productionized data project at any time. In this blog post, I listed some of the key concepts and tools that I learned in the past.
This is my blog post on Datumorphism https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/checklist/
The “AI Expert Roadmap”. This can be used as a checklist of prelims for data people.
This is the original paper of Fraser information.
Fisher information measures the second moment of the model sensitivity; Shannon information measures compressed information or variation of the information; Kullback (aka KL divergence) distinguishes two distributions. Instead of defining a measure of information for different conditions, Fraser tweaked the Shannon information slightly and made it more generic. The Fraser information can be reduced to Fisher information, Shannon information, and Kullback information under certain conditions.
It is such a simple yet powerful idea.
Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061 https://projecteuclid.org/journals/annals-of-mathematical-statistics/volume-36/issue-3/On-Information-in-Statistics/10.1214/aoms/1177700061.full
Voss, et al., “Branch Specialization”, Distill, 2021. https://distill.pub/2020/circuits/branch-specialization/
- Branch: neuron clusters that are roughly segregated locally, e.g., AlexNet branches by design.
- Branch specialization: branches specialize in specific tasks, e.g., the two AlexNet branches specialize in different detectors (color detector or black-white filter).
- Is it a coincidence? No. Branch specialization repeatedly occurs in different trainings and different models.
- Do we find the same branch specializations in different models and tasks? Yes.
- Why? The authors' proposal is that a positive feedback loop will be established between layers, and this loop enhances what the branch will do.
- Our brains have specialized regions too. Are there any connections?
Silla CN, Freitas AA. A survey of hierarchical classification across different application domains. Data Min Knowl Discov. 2011;22: 31–72. doi:10.1007/s10618-010-0175-9
A survey paper on hierarchical classification problems. It is a bit old as it didn’t consider the classifier chains, but this paper summarizes most of the ideas in hierarchical classification.
The authors also proposed a framework for the categorization of such problems using two different dimensions (ranks).
How the pandemic changed the way people collaborate.
- Siloing: From April 2019 to April 2020, modularity, a measure of workgroup siloing, rose around the world.
(Please refer to this post https://t.me/amneumarkt/199 for more background.)
I read the book “everyday data science”. I think it is not as good as I expected.
The book doesn’t explain things clearly at all. Besides, I was expecting something starting from everyday life and being extrapolate to something more scientific.
I also mentioned previously that I would like to write a similar book. Attached is something I created recently that is quite close to the idea of my ideal book for everyday data science.
Cross Referencing Post: https://t.me/amneumarkt/199
How do we interpret the capacities of the neural nets? Naively, we would represent the capacity using the number of parameters. Even for Hopfield network, Hopfield introduced the concept of capacity using entropy which in turn is related to the number of parameters.
But adding layers to neural nets also introduces regularizations. It might be related to capacities of the neural nets but we do not have a clear clue.
This paper introduced a new perspective using sparse approximation theory. Sparse approximation theory represents the data by encouraging parsimony. The more parameters, the more accurate the model is representing the training data. But it causes generalization issues as similar data points in the test data may have been pushed apart [^Murdock2021].
By mapping the neural nets to shallow “overcomplete frames”, the capacity of the neural nets is easier to interpret.
[Murdock2021]: Murdock C, Lucey S. Reframing Neural Networks: Deep Structure in Overcomplete Representations. arXiv [cs.LG]. 2021. Available: http://arxiv.org/abs/2103.05804
India is growing so fast
Global AI Vibrancy Tool Who’s leading the global AI race? https://aiindex.stanford.edu/vibrancy/
#ML Simple algorithm, powerful results
I just found an elegant decision tree visualization package for sklearn.
I have been trying to explain decision tree results to many business people. It is very hard. This package makes it much easier to explain the results to a non-techinical person.
Growth in data science interviews plateaued in 2020. Data science interviews only grew by 10% after previously growing by 80% year over year.
Data engineering specific interviews increased by 40% in the past year.
The easiest method to apply constraints to a dynamical system is through Lagrange multiplier, aka, penalties in statistical learning. Penalties don’t guarantee any conservation laws as they are simply penalties, unless you find the multiplers carrying some physical meaning like what we have in Boltzmann statistics. This paper explains a simple method to hardcode conservation laws in a Neural Network architecture.
TLDR: See the attached figure. Basically, the hardcoded conservation is realized using additional layers after the normal neural network predictions.
A quick bite of the paper: https://physics.aps.org/articles/v14/s25
Some thoughts: I like this paper. When physicists work on problems, they like dimensionlessness. This paper follows this convention. This is extremely important when you are working on a numerical problem. One should always make it dimensionless before implementing the equations in code.
If you are interested in free online AI Cons, Bosch CAI is organizing the AI Con 2021. This event starts tomorrow. https://www.ubivent.com/start/AI-CON-2021
Deep Learning Activation Functions using Dance Moves https://www.reddit.com/r/learnmachinelearning/comments/lvehmi/deep_learning_activation_functions_using_dance/?utm_medium=android_app&utm_source=share
Ah I have always been thinking about writing a book like this. Just bought the book to educate myself on communications.
From ref 1
we can take any expected utility maximization problem, and decompose it into an entropy minimization term plus a “make-the-world-look-like-this-specific-model” term.
This view should be combined with ref 2. If the utility is related to the curvature of the discrete state space, we are making a connection between entropy + KL divergence and curvature on graph. (This idea has to be polished in depth.)
- Trivial proof but interesting perspective: https://www.lesswrong.com/posts/voLHQgNncnjjgAPH7/utility-maximization-description-length-minimization
- Samal Areejit, Pharasi Hirdesh K., Ramaia Sarath Jyotsna, Kannan Harish, Saucan Emil, Jost Jürgen and Chakraborti Anirban 2021Network geometry and market instabilityR. Soc. open sci.8201734. http://doi.org/10.1098/rsos.201734
You can even use Chinese in GitHub Codespaces. 😱 Well this is trivial if you have Chinese input methods on your computer. What if you are using a company computer and you would like to add some Chinese comments just for fun….
Interesting talk on the softwares used by Apollo.
Can you predict AI winter using AI?
The new AI spring: a deflationary view
It’s actually fun to watch philosophers fighting each other. The author is trying to deflate the inflated expectations on AI by looking into why inflated expectations are harming our society. It’s not exactly based on evidence but still quite interesting to read.
| SpringerLink https://link.springer.com/article/10.1007/s00146-019-00912-z
Definitely weird. The authors used DNN to capture the firing behaviors of cortical neurons.
- A single hidden layer DNN (can you even call it Deep NN in this case?) can capture the neuronal activity without NMDA but with AMPA.
- With NMDA, the neuron requires more than 1 layer. This paper stops here.
WTH this is? Let’s go back to the foundations of statistical learning. What the author is looking for is a separation of “stimulation” space. The “stimulation” space is basically a very simple time series (Poissonic) space. We just need to map inputs back to the same space but with different feature values. Since the feature space is so small, we will absolutely fit everything if we increase the expressing power of the DNN. The thing is, we already know that NMDA-based synapses require more expressing power and we have very interpretable and good mathematical models for this… This research provides neither better predictability nor interpretability. Well done…
Maybe you have different opinions, prove me wrong.
We have been testing a new connected online work space using discord. Whoever is bored by home office can connect to a shared channel and chat.
Discord allows team voice chat and multiple screensharing. By adding bots to the channel, the team can share music playlists. Discord allows detailed adjustment of the voices so anyone could adjust volumes of any other users or even deafen himself/herself. So it is possible to be connected for the whole day.
It seems that jump in and chat at anytime and share working screen make it fun for WFH.
#TIL My cheerful price for the work I am currently doing is very high…
I find vscode remote-ssh very helpful. For some projects with frequent maintenance fixes, I prepared all the required environment on a remote server. I only need to click on the remote-ssh connection to connect to this remote server and immediately start my work. This low overhead setup makes me less reluctant to fix stuff. It is also possible to connect to Docker containers. By setting up different containers we can work in completely different environments with a few clicks. This is crazy.
Passing the Data Baton : A Retrospective Analysis on Data Science Work and Workers
A paper on the different components of data related work. They also proposed a framework and a team structure for data workers.
Machine Learning, Kolmogorov Complexity, and Squishy Bunnies http://www.theorangeduck.com/page/machine-learning-kolmogorov-complexity-squishy-bunnies
[D] Convolution Neural Network Visualization - Made with Unity 3D and lots of Code / source - stefsietz (IG) https://www.reddit.com/r/MachineLearning/comments/leq2kf/d_convolution_neural_network_visualization_made/?utm_medium=android_app&utm_source=share
ConnectedPapers is now integrated into arXiv.
This new perspective of references is often overlooked. It is not a gimmick at all.
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI
- Data quality is crucial in any AI especially for those with high-stakes.
- Many data work are overlooked easily: politics (some data entries are not recorded or misrecorded), human in the loop of data quality interventions for cleaning and wrangling but upstream data creation shall be controlled well too, etc
- Data Cascades: how the issues are cascading from upstream to downstream should be clear.
Data Cascades: compounding events causing negative, downstream effects from data issues, resulting in technical debt over time.
By computational neuroscientist
Road freight between Britain and EU is down by a third, data shows https://www.theguardian.com/politics/2021/jan/31/road-freight-britain-eu-down-third-data-shows-brexit
Yeah, thanks to brexit
hmm why is the gov/feds interested in this Gamestop thing? 🙀 (I have very limited knowledge about stocks.)
Eurostat built a dashboard to show the socioeconomic indicators of EU during the corona virus period. It is seen that most indicators are recovering.
#ML Sarcasm Detection with Sentiment Semantics Enhanced Multi-level Memory Network - ScienceDirect https://www.sciencedirect.com/science/article/abs/pii/S0925231220304689
Sheldon, this is your thing! (Didn’t read the paper. I just find this title a bit amusing.)
- “last-mile” effort in optimization is too high
Research debt is the accumulation of missing interpretive labor. It’s extremely natural for young ideas to go through a stage of debt, like early prototypes in engineering.
That is because our value system doesn’t respect interpretors…
Some people claim “git fetch; git rebase origin/master” is equivalent to “git pull -r”, but it isn’t.
git pull -r also deals with squashes.
Rebase hell happens when several commits on your branch edit the same area, and upstream also touched the same area. The problem occurs because each conflict resolution will itself conflict with the subsequent commit in the series.
TIL Larry Hillblom, the H of DHL, regularly took “sex safari” trips to Asia to prey on underage girls. When he died in a plane crash, 4 of the illegitimate children he fathered were able to claim $50 million each from his estate.
The authors have got too many questions regarding Chinese translations….
介绍了一种基于粗略分类和顺序编号来整理数字内容的方法。分类的部分跟我目前整理文件的方式很类似，编号的做法则给我带来一些启发：令人联想到跟政府打交道时使用的一些表格，比如 I-140、1099-B 等。看似是随意的字符序列，但所有熟悉话题的人都明确它们是什么。
The broken jargon system in statistics …😱
Depending on the context, an independent variable is sometimes called a “predictor variable”, regressor, covariate, “controlled variable”, “manipulated variable”, “explanatory variable”, exposure variable (see reliability theory), “risk factor” (see medical statistics), “feature” (in machine learning and pattern recognition) or “input variable.” In econometrics, the term “control variable” is usually used instead of “covariate”.
This is one of the hidden problems of our world. In some sense, the US is destroying the world. If you look at Germany, plastic recycling is much easier with all these machines in the stores. (or, is it?)
An interesting idea on time series predictions. Instead of predicting the exact time series, the author proposed a method to predict the future using ordinal patterns.
The figure shows how to disintegrate the time series into 8 overlapping short-term series (each with three numbers). To transform the short-term series into patterns, we write down the permutation pattern (for size of the series D=3, we have only 6 possible permutations). Then we will use the permutation patterns in the past to predict the patterns in the future. BTW, this paper used the price of bitcoins as an example to test this method. This method will not be super amazing. The point of this paper is to propose a simple method to predict the future using very limited resource.
This is the paper: https://royalsocietypublishing.org/doi/10.1098/rsos.201011 Short-term prediction through ordinal patterns
#cn http://www.cddata.gov.cn/oportal/index 成都竟然有开放数据平台，而且做的还不错。
I had the same idea that git fast forward merge is more or less the same as rebase. Until I read this stackoverflow answer.
I guess we should always rebase whenever possible to maintain a clean history.
In the past years, I have been building a showcase of digital tools for academic researchers.
It started with some friends asking for recommendations of tools for reference management, visualization, note-taking, and so on ad infinitum.
So I built a GitHub repo to share what I have learned about these tools. This was way before the “awesome repo” concept. Later came the “GitHub awesome repo” shitstorm. Everyone is building an “awesome repo”. I created a website for a better user experience to flee from the shitstorm.
Tools for Academic Research is a website for digital tool listings. At the moment, there are 154 tools listed. You can browse by tags or categories to find whatever you need. Or add an item (books, tools, reviews, etc) you love.
hmmm it must have been an endless reviewing process. 😱
Description of tables
Have some sunshine! 银装素裹
[D] We Need More Data Engineers, Not Data Scientists https://www.reddit.com/r/MachineLearning/comments/kx0j1v/d_we_need_more_data_engineers_not_data_scientists/
#business In 2015, there was a company called SixFold. They were one of the first heroes to disrupt an industry that has not changed much for a century, the freight market. They investigated the situation, established their hypothesis, created MVP. They did not succeed. The image is a summary of their post mortem.
There are at least two learning from this story.
- Think in terms of the utility function. Do not just point out blocks of reasons. Write down the utility function for the situation and make assumptions on the parameters.
- Swarm intelligence sometimes works better than one might expect. Improvements in swarm intelligence take a lot of effort if one does not have a smart plan.
Here is the article by their CEO: https://medium.com/@MartKelder/end-of-road-for-trucking-startup-palleter-523a4a906fe9
Alan Turing Institute created a package called skpro for probabilistic modeling. Unlike many other probabilistic modeling packages, skpro integrates into sklearn pretty well.
A 30-year-old Ph.D. student in a joint program of Chicago Booth and the Kenneth C. Griffin Department of Economics, Fan was shot and killed on Jan. 9.
Related news article: https://www.globaltimes.cn/page/202101/1212449.shtml
Fan was shot and killed in his car in the parking garage at an apartment building at about 1:50 pm Saturday. After shooting and killing Fan, the suspect, identified as 32-year-old Jason Nightengal by police, went on to shoot others across the city, reports said.
Intrinsic interpretability. arXiv: https://arxiv.org/abs/2002.01650
Mike Bostock made a Hertzsprung–Russell Diagram using d3.js. It looks so cool.
I have been using Obsidian as my primary note-taking app for a while. It was a rough start. Linking notes was simply not in my workflow. In some sense, I am not familiar with my notes after a while. So I started to work on notes reviews every two weeks. On each notes review, I go through my notes inbox and spend some time connecting them with the the existing ones.
This is how my notes look like now. They are mostly well connected. (The cluster is because I have archived them as they are the notes for my previous position.)
I also borrowed the domain concept from dendron. I created folders with dot delimited domains. For example, I have this folder named inbox.ml which I use as my inbox for machine learning related notes. These notes will be distributed to a corresponding folder during my notes review.
Those notes worth publishing will then be distributed to my websites. For example, https://datumorphism.leima.is/ is for data science related notes.
#intelligence #paper #ML Superintelligence Cannot be Contained: Lessons from Computability Theory https://www.jair.org/index.php/jair/article/view/12202
We argue that total containment is, in principle, impossible, due to fundamental limits inherent to computing itself. Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) impossible.
A nice colloquium paper: The unreasonable effectiveness of deep learning in artificial intelligence | PNAS https://www.pnas.org/content/117/48/30033
How the economic machine works make easy
If you are in Phoenix.
https://github.com/volotat/DiffMorph #machinelearning #opensource
Image morphing without reference points by applying warp maps and optimizing over them.
Source: https://science.sciencemag.org/content/370/6523/1410.full A gatekeeper for learning
Upon learning a hippocampus-dependent associative task, perirhinal inputs might act as a gate to modulate the excitability of apical dendrites and the impact of the feedback stream on layer 5 pyramidal neurons of the primary somatosensory cortex.
😲 In some sense, perirhinal inputs are like config files for learning.
Could you prevent a pandemic? A very 2020 video game https://play.acast.com/s/nature/2020festivespectacular
Here we propose PauliNet, a deep-learning wavefunction ansatz that achieves nearly exact solutions of the electronic Schrödinger equation for molecules with up to 30 electrons
UK gov has an official covid 19 API. https://coronavirus.data.gov.uk/details/developers-guide#structure-metrics
I found this funny typo in the documentation. 😂 The first one should be cumCasesByPublishDateRate.
I ran into this hilarious comment on pie chart in a book called The Grammar of Graphics.
“To prevent bias, give the child the knife and someone else the first choice of slices.” 😱😱😱
#tools #writing https://www.losethevery.com/
“Very good english” is not very good english. Lose the very.
#datascience #career #academia
I regret quitting astrophysics
me too 😂 though not an astrophysicist, I miss academia too
#tools Space: The Integrated Team Environment https://www.jetbrains.com/space/
Wow, I love jetbrains.
#machinelearning https://arxiv.org/abs/2007.04504 Learning Differential Equations that are Easy to Solve
Jacob Kelly, Jesse Bettencourt, Matthew James Johnson, David Duvenaud
Differential equations parameterized by neural networks become expensive to solve numerically as training progresses. We propose a remedy that encourages learned dynamics to be easier to solve. Specifically, we introduce a differentiable surrogate for the time cost of standard numerical solvers, using higher-order derivatives of solution trajectories. These derivatives are efficient to compute with Taylor-mode automatic differentiation. Optimizing this additional objective trades model performance against the time cost of solving the learned dynamics. We demonstrate our approach by training substantially faster, while nearly as accurate, models in supervised classification, density estimation, and time-series modelling tasks.
#science The ergodicity problem in economics | Nature Physics https://www.nature.com/articles/s41567-019-0732-0
I read another paper about hot hand/gamblers' fallacy a while ago and the author of that paper took a similar view. Here is the article: Surprised by the Hot Hand Fallacy ? A Truth in the Law of Small Numbers by Miller
Skillearn: Machine Learning Inspired by Humans' Learning Skills
Interesting idea. I didn’t know interleaving is already being used in ML.
CCC is hosting the event for 2020 fully online. Everyone can join with a pay-as-you-wish ticket. Join if you like programming, hacking, social events, learning something crazy and new. 👍👍👍
https://arxiv.org/abs/2012.00152 Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Deep learning’s successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods.
A new search engine by a former chief scientist who helped developing the AI platform Einstein for Salesforce.
The new search engine is called “you”.
Will there be more women on the list in 10 years?
Oh Hi, it’s you, Mask. Or social distancing?
I did some investigation on the salary of tech employees working for Cologne city. It seems that the salary for IT employees are quite low. This may not be a fair representation of the whole Germany. But Cologne is one of the most digitized cities in Germany. So I would guess it should be a fair example.
For example, a data manager is among the salary group 11 (net from 2144EUR to 2993EUR)
This is the job description: https://www.stadt-koeln.de/politik-und-verwaltung/ausbildung-karriere-bei-der-stadt/stellenangebote/datenmanagerin-beziehungsweise-datenmanager-mwd-im-amt-fuer-informationsverarbeitung
Does Apple really log every app you run? A technical look
- No, macOS does not send Apple a hash of your apps each time you run them. You should be aware that macOS might transmit some opaque information about the developer certificate of the apps you run. This information is sent out in clear text on your network.
- You shouldn’t probably block ocsp.apple.com with Little Snitch or in your hosts file.
Going from Bad to Worse: From Internet Voting to Blockchain Voting
This article examines the suggestions that “voting over the Internet” or “voting on the blockchain”would increase election security, and finds such claims to be wanting and misleading. While currentelection systems are far from perfect, Internet- and blockchain-based voting would greatly increase therisk of undetectable, nation-scale election failures
(Anderson, 2009) argues that research paper merit is Zipf-distributed: many papers are clear rejects, while a few are clear accepts. In between those two extremes, decisions are very difficult, and any differences between the best rejected and the worst accepted paper are tiny, even given the best possible set of reviewers.
(And the following is quite discriminating towards non english speaking researchers.)
Work not-on-English: English is the “default” language to study (Bender, 2019), and work on other languages is easily accused of being “niche” and non-generalizable - even though English only workis equally non-generalizable.
Wow what is gonna happen to microsoft
Guido van Rossum @gvanrossum I decided that retirement was boring and have joined the Developer Division at Microsoft. To do what? Too many options to say! But it’ll make using Python better for sure (and not just on Windows :-). There’s lots of open source here. Watch this space.
I just finished the book Grokking Algorithms last night. https://www.manning.com/books/grokking-algorithms
I think it is a well-written book for people who is not from a CS background. The book has a lot of examples showing how the algorithms work step by step. To me, the most interesting chapter is dynamic programming. I had a lot of fun reading this. Highly recommended if you are interested in algorithms!
An emerging consensus for open evaluation: 18 visions for the future of scientific publishing
This is hilarious.
Comment Am Neumarkt: The best information designers are summoned on each election day. It is a good time to learn about the best practices of data visualization. This “paths to victory” visualization is one of the best I have ever seen. If the put some probabilities on each branch, it becomes a transitional decision tree to estimate risks used by investors. Does it tell us anything useful directly? Not really. Not all branches are created equal. Without probabilities, It is as useless as a piece of blank paper. But it helps people do some little experiments to feel the competitiveness. In some sense, the probabilities are encoded in the reader’s head. Each reader provides a different reality of probabilities.
Also they started to report uncertainties. I remember last time they were using jittering pointers to educate people of the uncertainties. Now they have range of estimates. Showing ranges is an important step forward.
What is next? Space stations by private companies like bigelow?
Urban Dictionary Embeddings for Slang NLP Applications - ACL Anthology https://www.aclweb.org/anthology/2020.lrec-1.586
Haha very cool Water <-> butt-splash Soda <-> sodagasm
Muotri, a neuroscientist at the University of California, San Diego (UCSD), has found some unusual ways to deploy his. He has connected organoids to walking robots, modified their genomes with Neanderthal genes, launched them into orbit aboard the International Space Station, and used them as models to develop more human-like artificial-intelligence systems.
The effect of influenza vaccination on trained immunity: impact on COVID-19 | medRxiv https://www.medrxiv.org/content/10.1101/2020.10.14.20212498v1
Hospital workers who got vaccinated were significantly less likely to develop COVID than those who did not
I believe that is just a simple sampling problem. People had flu shot this year because they’re really careful about infectious diseases. They maybe also sanitize more.
Explorer | Explore Human Knowledge Explorer by Batou. Navigate wikipedia visually!. Product topic: Web App, User Experience, Education, Artificial Intelligence, Tech View on Product Hunt
StellarX Create collaborative spaces & rich simulations without code. Product topic: Virtual Reality, Design Tools, Education, Artificial Intelligence, Augmented Reality, Tech View on Product Hunt
ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis
To assist human review process, we build a novel ReviewRobot to automatically assign a review score and write comments for multiple categories. A good review needs to be knowledgeable, namely that the comments should be constructive and informative to help improve the paper; and explainable by providing detailed evidence. ReviewRobot achieves these goals via three steps: (1) We perform domain-specific Information Extraction to construct a knowledge graph (KG) from the target paper under review, a related work KG from the papers cited by the target paper, and a background KG from a large collection of previous papers in the domain. (2) By comparing these three KGs we predict a review score and detailed structured knowledge as evidence for each review category. (3) We carefully select and generalize human review sentences into templates, and apply these templates to transform the review scores and evidence into natural language comments. Experimental results show that our review score predictor reaches 71.4-100% accuracy. Human assessment by domain experts shows that 41.7%-70.5% of the comments generated by ReviewRobot are valid and constructive, and better than human-written ones 20% of the time. Thus, ReviewRobot can serve as an assistant for paper reviewers, program chairs and authors.
Jackson, D. E., & Ratnieks, F. L. W. (2006). Communication in ants. Current Biology, 16(15), R570–R574. https://doi.org/10.1016/j.cub.2006.07.015
I just realized that what we have been calling swarm intelligence is not very different from our single agent intelligence. They are all dealing with information diffusion. Using the diffusion, swarm intelligence shares the global information with dumb agents. Our brain, on the other hand, is using information diffusion (using Ca as an agent) as a way to regulate neuron firing rate. This is also a way to share the global firing status with each neuron. It is even more interesting if we think of it as a hierarchical model. “Single agent” is using smaller agents for their own intelligence. A “single agent” is also a part of a larger agent. In the end, we are just part of Gaia.
I just learned today that the Cooper in BCS for superconductivity and the Cooper in BCM in neuroscience are the same Cooper. This guy is amazing.
“The intervention placed and retained frequent user, chronically homeless individuals in housing. It decreased psychiatric ED visits and shelter use, and increased outpatient mental health care, but not medical ED visits or hospitalizations. Limitations included more than one‐third of usual care participants received another form of subsidized housing, potentially biasing results to the null, and loss of power due to high death rates. PSH can house high‐risk individuals and reduce emergent psychiatric services and shelter use. Reductions in hospitalizations may be more difficult to realize.”
I am a lit bit scared whenever I think about how it accesses the papers. It is a black box and we have no idea if scihub is doing this in a way that is accepted by every researcher. On the other hand, it is not easy to live without scihub. There are legal alternatives like unpaywall and kopernio but they are way behind the game. What shall we do? Require the author of scihub to open source the code? Continue using a black box that may hurt other people? I don’t know.
“Docker for Mac uses https://github.com/moby/hyperkit to emulate the hypervisor capabilities and Hyperkit uses hypervisor.framework in its core. Hypervisor.framework is Mac’s native hypervisor solution. Hyperkit also uses VPNKit and DataKit to namespace network and filesystem respectively.”
hmmm I guess this is why docker on mac uses a lot of resources compared to its linux version
Very interesting research on voting mechanism. They built a theory to understand how the freemason member selection procedures shape the community. The Freemason only integrates a member if the member is accepted by all of the current members.
details-on demand is so popular and crucial to perception.
Bifrost Data Search Find the perfect image datasets for your next ML project. Product topic: Analytics, Robots, Developer Tools, Artificial Intelligence, Tech, Maker Tools View on Product Hunt
Orchest An open source tool for creating data science pipelines. Product topic: Productivity, Open Source, Developer Tools, Tech View on Product Hunt
Stat of the day Fascinating and important stats from the rest of world. Product topic: News, User Experience View on Product Hunt
Paletro Enable command palette (⇧⌘P) in any application on macOS. Product topic: Mac, Productivity View on Product Hunt