Am Neumarkt

Machine learning and other gibberish on Telegram; https://t.me/amneumarkt

265

Modified:
Tags:
#visualization

#visualization

The Doomsday Datavisualizations - Bulletin of the Atomic Scientists

https://thebulletin.org/doomsday-clock/datavisualizations/

264

Modified:
Tags:
#data

263

Modified:
Tags:
#ML

#ML

A Gentle Introduction to Graph Neural Networks https://distill.pub/2021/gnn-intro

262

Modified:
Tags:
#ML

#ML

The authors investigate the geometry formed by the responses of neurons for certain stimulations (tunning curve). Using stimulation as the hidden variable, we can construct a geometry of neuron responses. The authors clarified the relations between this geometry and other measurements such as mutual information.

The story itself in this paper may not be interesting to machine learning practitioners. But the method of using the geometry of neuron responses to probe the brain is intriguing. We …

261

Modified:
Tags:
#ML #self-supervised #representation

#ML #self-supervised #representation

Contrastive loss is widely used in representation learning. However, the mechanism behind it is not as straightforward as it seems.

Wang & Isola proposed a method to rewrite the contrastive loss in to alignment and uniformity. Samples in the feature space are normalized to unit vectors. These vectors are allocated onto a hypersphere. The two components of the contrastive loss are

  • alignment, which forces the positive samples to be aligned on the …

259

Modified:
Tags:
#中文 #visualization

#中文 #visualization

看到 TMS channel 推荐的 data stiches, https://datastitches.substack.com/ 关注了几期,感觉质量非常好,经常能看到很棒的作品。

同时推荐一下 TMS channel https://t.me/tms_ur_way/1031 关于时间管理,效率,和人生。

257

Modified:
Tags:
#DS

#DS

Cute comics on interactive data visualization

https://hdsr.mitpress.mit.edu/pub/49opxv6v/release/1

256

Modified:
Tags:
#DS

#DS

Jetbrains released a new IDE for data scientist.

https://www.jetbrains.com/dataspell/

255

Modified:
Tags:
#ML

#ML

😂 Jürgen Schmidhuber invented transformers in the 90s.

https://people.idsia.ch/~juergen/fast-weight-programmer-1991-transformer.html

253

Modified:
Tags:
#DS

#DS

Hullman J, Gelman A. Designing for interactive exploratory data analysis requires theories of graphical inference. Harvard Data Science Review. 2021. doi:10.1162/99608f92.3ab8a587 https://hdsr.mitpress.mit.edu/pub/w075glo6/release/2

Creating visualizations seems to be a creative task. At least for entry-level visualization tasks, we follow our hearts and build whatever is needed. However, visualizations are made for different purposes. Some visualizations are simply explorations and for us …

252

Modified:
Tags:
#ML

#ML

https://www.microsoft.com/en-us/research/blog/make-every-feature-binary-a-135b-parameter-sparse-neural-network-for-massively-improved-search-relevance/

Though not the core of the model, I noticed that this model (MEB) uses the user search behavior on Bing to build the language model. If a search result on Bing is clicked by the user, it is considered to be a positive sample for the query, otherwise a negative sample.

In self-supervised learning, it has been shown that negative sampling is …

251

Modified:
Tags:
#science

#science

Nielsen M. Reinventing discovery: The New Era of networked science. Princeton, NJ: Princeton University Press; 2011.

I found this book this morning and skimmed through it. It looks concise yet unique. The author discusses how the internet is changing the way human beings think as one collective intelligence. I like the chapters about how the data web is enabling more scientific discoveries.

250

Modified:
Tags:
#ML

#ML

https://thegradient.pub/systems-for-machine-learning/

challenges in data collection, verification, and serving tasks

249

Modified:

https://github.com/soumith/ganhacks

Training GAN can be baffling. For example, the generator and the discriminator just don’t “learn” at the same scale sometimes. Would you try to balance the generator loss and discriminator loss by hand? Soumith Chintala ( @ FAIR ) put together this list of tips for training GAN. “Don’t balance loss via statistics” is one of the 17 tips by Chintala. The list is quite inspiring.

248

Modified:

I have downloaded the file so you don’t need to.

Anaconda-2021-SODS-Report-Final.pdf

247

Modified:
Tags:
#DS

#DS

This is an interesting report by anaconda. We can kind of confirm from this that Python is still the king of languages for data science. SQL is right following Python.

Quote from the report:

Between March 2020 to February 2021, the pandemic economic period, we saw 4.6 billion package downloads, a 48% increase from the previous year. We have no data for other languages so no predictions can be made but it is interesting to see Python growing so fast.

The roadblocks different data …

246

Modified:
Tags:
#ML

#ML

Julia Computing got a lot of investment recently. I need to dive deeper into the Julia Language.

https://juliacomputing.com/blog/2021/07/series-a/

245

Published:
Tags:
#DS

#DS

PyData goes virtual this year.

https://pydata.org/global2021/present/

244

Published:
Tags:
#Coding

#Coding

I found a nice place to practice programming thinking. It is not as comprehensive as hackerrank/leetcode but these problems are quite fun.

https://codingcompetitions.withgoogle.com/

243

Modified:
Tags:
#ML

#ML

Implicit Regularization in Tensor Factorization: Can Tensor Rank Shed Light on Generalization in Deep Learning? – Off the convex path http://www.offconvex.org/2021/07/08/imp-reg-tf/

242

Modified:
Tags:
#TIL

#TIL

In PyTorch, conversion from Torch tensors to numpy arrays is very fast on CPUs, though torch tensors and numpy arrays are very different things. This is because of the Python buffer protocol. The protocol makes it possible to use binary data directly from C without copying the object.

https://docs.python.org/3/c-api/buffer.htm

Reference: Eli Stevens Luca Antiga. Deep Learning with PyTorch: Build, Train, and Tune Neural Networks Using Python Tools. Simon and Schuster, 2020;

241

Modified:
Tags:
#Academia

#Academia

The distill team’s thought on interactive publishing and self-publishing in academia.

https://distill.pub/2021/distill-hiatus/

240

Published:
Tags:
#ML

#ML

Great. Tensorflow implemented built-in decision forest models.

https://blog.tensorflow.org/2021/05/introducing-tensorflow-decision-forests.html?m=1

239

Modified:
Tags:
#fun

#fun

GitHub Copilot · Your AI pair programmer https://copilot.github.com/

This is crazy.

What is GitHub Copilot? GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. GitHub Copilot draws context from comments and code, and suggests individual lines and whole functions instantly. GitHub Copilot is powered by OpenAI Codex, a new AI system created by OpenAI. The GitHub Copilot technical preview is available as a Visual Studio Code extension.

How good is …

238

Modified:
Tags:
#ML

#ML

A Turing lecture article by the three famous DL guys. It’s an overview of the history, development, and future of AI. There are two very interesting points in the outlook section:

  • “From homogeneous layers to groups of neurons that represent entities.” In biological brains, there are memory engrams and motifs that almost do this.
  • “Multiple time scales of adaption.” This is another key idea that has been discussed numerous times. One of the craziest …

237

Modified:
Tags:
#ML

#ML

Geometric Deep Learning is an attempt to unify deep learning using geometry. Instead of building deep neural networks ignoring the symmetries in the data and leaving it to be discovered by the network, we apply the symmetries in the problem to the network. For example, instead of flattening the matrix of a cat image and have some predetermined order of the pixels, we apply a translational transformation on the 2D image and the cat should also be a cat without any doubt. This transformation …

236

Published:
Tags:
#DS

#DS

A library for interactive visualization directly from pandas.

https://github.com/santosjorge/cufflinks

235

Modified:
Tags:
#ML

#ML

The Bayesian hierarchical model provides a process to use Bayesian inference hierarchically to update the posteriors. What is a Bayesian model? In a Bayesian linear regression problem, we can take the posterior from the previous data points and use it as our new prior for inferring based on new data. In other words, as more data coming in, our belief is being updated. However, this is a problem if some clusters in the dataset have small sample sizes, aka small support. As we take these …

234

Modified:
Tags:
#academia

233

Modified:
Tags:
#fun

232

Published:
Tags:
#DS

#DS

This paper serves as a good introduction to the declarative data analytics tools.

Declarative analytics performs data analysis using a declarative syntax instead of functions for specific algorithms. Using declarative syntax, one can “describe what you want the program to achieve rather than how to achieve it”. To be declarative, the declarative language has to be specific on the tasks. With this, we can only turn the knobs of some predefined model. To me, this is a deal-breaker.

Anyways, …

231

Modified:
Tags:
#DS

#DS

https://octo.github.com/projects/flat-data

Hmmm, so they gave it a name. I’ve built so many projects using this approach. I started building such data repos using CI/CD services way before github actions was born. Of course github actions made it much easier. One of them is the EU covid data tracking project ( https://github.com/covid19-eu-zh/covid19-eu-data ). It’s been running for more than a year with very little maintenance. Some covid projects even copied our EU covid data …

230

Modified:
Tags:
#ML

#ML

An interesting talk:


Dear all,

We are pleased to have Anna Golubeva speak on “Are wider nets better given the same number of parameters?” on Wednesday May 19th at 12:00 ET.

You can find further details here and listen to the talk here.

We hope you can join!

Best,

Sven

229

Published:
Tags:
#DS

#DS

“Don’t pull down the data. Do it with SQL.”

https://hakibenita.com/sql-for-data-analysis

228

Modified:
Tags:
#career #DS

#career #DS

I believe this article is relevant. Most data scientists have very good academic records. These experiences of excellence compete with another required quality in the industry: The ability to survive in a less ideal yet competitive environment. We could be stubborn and find the environment that we fit well in or adapt based on the business playbook. Either way is good for us as long as we find the path that we love.

(I have a joke about this article: To reasoning productively, we do …

227

Modified:
Tags:
#DS #EDA #Visualization

#DS #EDA #Visualization

If you are keen on data visualization, the new Observable Plot is something exciting for you. Observable Plot is based on d3 but it is easier to use in Observable Notebook. It also follows the guidelines of the layered grammar of graphics (e.g., marks, scales, transforms, facets.).

https://observablehq.com/@observablehq/plot

226

Modified:
Tags:
#DS

#DS

(This is an automated post by IFTTT.)

It is always good for a data scientist to understand more about data engineering. With some basic data engineering knowledge in mind, we can navigate through the blueprint of a fully productionized data project at any time. In this blog post, I listed some of the key concepts and tools that I learned in the past.

This is my blog post on Datumorphism https://datumorphism.leima.is/wiki/data-engeering-for-data-scientist/checklist/

225

Modified:
Tags:
#DS #ML

#DS #ML

The “AI Expert Roadmap”. This can be used as a checklist of prelims for data people.

https://i.am.ai/roadmap/#note

224

Modified:
Tags:
#statistics

#statistics

This is the original paper of Fraser information.

Fisher information measures the second moment of the model sensitivity; Shannon information measures compressed information or variation of the information; Kullback (aka KL divergence) distinguishes two distributions. Instead of defining a measure of information for different conditions, Fraser tweaked the Shannon information slightly and made it more generic. The Fraser information can be reduced to Fisher information, Shannon …

223

Modified:
Tags:
#DS
#DS Wing JM. Ten research challenge areas in data science. Harvard Data Science Review. 2020;114: 1574–1596. doi:10.1162/99608f92.c6577b1f https://hdsr.mitpress.mit.edu/pub/d9j96ne4/release/2

222

Published:

221

Published:
Tags:
#ML

#ML

Voss, et al., “Branch Specialization”, Distill, 2021. https://distill.pub/2020/circuits/branch-specialization/

TLDR;

  • Branch: neuron clusters that are roughly segregated locally, e.g., AlexNet branches by design.
  • Branch specialization: branches specialize in specific tasks, e.g., the two AlexNet branches specialize in different detectors (color detector or black-white filter).
  • Is it a coincidence? No. Branch specialization repeatedly occurs in different trainings and different …

220

Modified:
Tags:
#ML

#ML

Silla CN, Freitas AA. A survey of hierarchical classification across different application domains. Data Min Knowl Discov. 2011;22: 31–72. doi:10.1007/s10618-010-0175-9

A survey paper on hierarchical classification problems. It is a bit old as it didn’t consider the classifier chains, but this paper summarizes most of the ideas in hierarchical classification.

The authors also proposed a framework for the categorization of such problems using two different dimensions (ranks).

219

Published:
AI researchers allege that machine learning is alchemy | Science | AAAS https://www.sciencemag.org/news/2018/05/ai-researchers-allege-machine-learning-alchemy

217

Published:
Tags:
#TIL

#TIL

How the pandemic changed the way people collaborate.

  1. Siloing: From April 2019 to April 2020, modularity, a measure of workgroup siloing, rose around the world.

https://www.microsoft.com/en-us/research/blog/advancing-organizational-science-using-network-machine-learning-to-measure-innovation-in-the-workplace/

216

Modified:
Tags:
#DataScience

#DataScience

(Please refer to this post https://t.me/amneumarkt/199 for more background.)

I read the book “everyday data science”. I think it is not as good as I expected.

The book doesn’t explain things clearly at all. Besides, I was expecting something starting from everyday life and being extrapolate to something more scientific.

I also mentioned previously that I would like to write a similar book. Attached is something I created recently that is quite close to the idea of …

214

Published:

213

Published:
Tags:
#fun
#fun A moment of joy for Friday. https://www.youtube.com/watch?v=ZI0w_pwZY3E

210

Published:
Tags:
#ML

#ML

How do we interpret the capacities of the neural nets? Naively, we would represent the capacity using the number of parameters. Even for Hopfield network, Hopfield introduced the concept of capacity using entropy which in turn is related to the number of parameters.

But adding layers to neural nets also introduces regularizations. It might be related to capacities of the neural nets but we do not have a clear clue.

This paper introduced a new perspective using sparse approximation theory. …

209

Modified:
Tags:
#fun

#fun

India is growing so fast

Oh Germany…

Global AI Vibrancy Tool Who’s leading the global AI race? https://aiindex.stanford.edu/vibrancy/

208

Published:
Tags:
#ML

207

Published:
Tags:
#ML

206

Published:
Tags:
#ML

#ML

I just found an elegant decision tree visualization package for sklearn.

I have been trying to explain decision tree results to many business people. It is very hard. This package makes it much easier to explain the results to a non-techinical person.

https://github.com/parrt/dtreeviz

205

Modified:
Tags:
#fun

204

Published:
Tags:
#fun

#fun

Growth in data science interviews plateaued in 2020. Data science interviews only grew by 10% after previously growing by 80% year over year.

Data engineering specific interviews increased by 40% in the past year. 

https://www.interviewquery.com/blog-data-science-interview-report

203

Modified:
Tags:
#ML #Phyiscs

#ML #Phyiscs

The easiest method to apply constraints to a dynamical system is through Lagrange multiplier, aka, penalties in statistical learning. Penalties don’t guarantee any conservation laws as they are simply penalties, unless you find the multiplers carrying some physical meaning like what we have in Boltzmann statistics. This paper explains a simple method to hardcode conservation laws in a Neural Network architecture.

Paper: …

202

Published:
Agenda for AI CON 2021

201

Modified:
Tags:
#event

#event

If you are interested in free online AI Cons, Bosch CAI is organizing the AI Con 2021. This event starts tomorrow. https://www.ubivent.com/start/AI-CON-2021

200

Modified:
Tags:
#ML

199

Modified:
Tags:
#DataScience

#DataScience

Ah I have always been thinking about writing a book like this. Just bought the book to educate myself on communications.

https://andrewnc.github.io/blog/everyday_data_science.html

198

Modified:
Tags:
#ML

#ML

note2self:

From ref 1

we can take any expected utility maximization problem, and decompose it into an entropy minimization term plus a “make-the-world-look-like-this-specific-model” term.

This view should be combined with ref 2. If the utility is related to the curvature of the discrete state space, we are making a connection between entropy + KL divergence and curvature on graph. (This idea has to be polished in depth.)

Refs:

  1. Trivial proof but interesting perspective: …

197

Modified:
Tags:
#dev

#dev

You can even use Chinese in GitHub Codespaces. 😱 Well this is trivial if you have Chinese input methods on your computer. What if you are using a company computer and you would like to add some Chinese comments just for fun….

196

Modified:
Tags:
#fun

195

Published:
Neural Networks❤️

194

Modified:
Tags:
#fun

#fun

Can you predict AI winter using AI?

193

Modified:
Tags:
#ML

#ML

The new AI spring: a deflationary view

It’s actually fun to watch philosophers fighting each other. The author is trying to deflate the inflated expectations on AI by looking into why inflated expectations are harming our society. It’s not exactly based on evidence but still quite interesting to read.

| SpringerLink https://link.springer.com/article/10.1007/s00146-019-00912-z

192

Modified:
Tags:
#neuroscience

#neuroscience

Definitely weird. The authors used DNN to capture the firing behaviors of cortical neurons.

  • A single hidden layer DNN (can you even call it Deep NN in this case?) can capture the neuronal activity without NMDA but with AMPA.
  • With NMDA, the neuron requires more than 1 layer. This paper stops here.

WTH this is? Let’s go back to the foundations of statistical learning. What the author is looking for is a separation of “stimulation” space. The …

191

Published:
Tags:
#fun

#fun

We have been testing a new connected online work space using discord. Whoever is bored by home office can connect to a shared channel and chat.

Discord allows team voice chat and multiple screensharing. By adding bots to the channel, the team can share music playlists. Discord allows detailed adjustment of the voices so anyone could adjust volumes of any other users or even deafen himself/herself. So it is possible to be connected for the whole day.

It seems that jump in and chat at anytime …

190

Published:
Tags:
#TIL

#TIL My cheerful price for the work I am currently doing is very high…

https://www.lesswrong.com/posts/MzKKi7niyEqkBPnyu/your-cheerful-price

189

Published:
Tags:
#productivity

#productivity

I find vscode remote-ssh very helpful. For some projects with frequent maintenance fixes, I prepared all the required environment on a remote server. I only need to click on the remote-ssh connection to connect to this remote server and immediately start my work. This low overhead setup makes me less reluctant to fix stuff. It is also possible to connect to Docker containers. By setting up different containers we can work in completely different environments with a few clicks. This …

188

Modified:

Passing the Data Baton : A Retrospective Analysis on Data Science Work and Workers

A paper on the different components of data related work. They also proposed a framework and a team structure for data workers.

187

Modified:
Tags:
#ML

#ML

Machine Learning, Kolmogorov Complexity, and Squishy Bunnies http://www.theorangeduck.com/page/machine-learning-kolmogorov-complexity-squishy-bunnies

186

Published:
Right…

185

Published:
What? This is my most used non communication app….

184

Published:
https://lwn.net/Articles/845480/ Interesting. This is accepted.

183

Modified:
Tags:
#ML

#ML

[D] Convolution Neural Network Visualization - Made with Unity 3D and lots of Code / source - stefsietz (IG) https://www.reddit.com/r/MachineLearning/comments/leq2kf/d_convolution_neural_network_visualization_made/?utm_medium=android_app&utm_source=share

182

Published:
Tags:
#research

181

Modified:
Tags:
#research

#research

ConnectedPapers is now integrated into arXiv.

This new perspective of references is often overlooked. It is not a gimmick at all.

180

Modified:
Tags:
#ML

#ML

“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

TL;DR:

  • Data quality is crucial in any AI especially for those with high-stakes.
  • Many data work are overlooked easily: politics (some data entries are not recorded or misrecorded), human in the loop of data quality interventions for cleaning and wrangling but upstream data creation shall be controlled well too, etc
  • Data Cascades: how the issues are cascading from upstream to downstream should be clear. …

179

Modified:
Tags:
#ML

178

Published:
Tags:
#fun

#fun

hahaha

176

Modified:
Tags:
#market

#market

Road freight between Britain and EU is down by a third, data shows https://www.theguardian.com/politics/2021/jan/31/road-freight-britain-eu-down-third-data-shows-brexit

Yeah, thanks to brexit

175

Published:
Tags:
#market

174

Published:
Tags:
#data

#data

https://ec.europa.eu/eurostat/cache/recovery-dashboard/

Eurostat built a dashboard to show the socioeconomic indicators of EU during the corona virus period. It is seen that most indicators are recovering.

173

Modified:
Tags:
#ML

#ML Sarcasm Detection with Sentiment Semantics Enhanced Multi-level Memory Network - ScienceDirect https://www.sciencedirect.com/science/article/abs/pii/S0925231220304689

Sheldon, this is your thing! (Didn’t read the paper. I just find this title a bit amusing.)

172

Modified:
Tags:
#ML

#ML http://akosiorek.github.io/ml/2018/03/14/what_is_wrong_with_vaes.html

  • instabilities
  • “last-mile” effort in optimization is too high

171

Published:
Tags:
#science

#science https://distill.pub/2017/research-debt/

Research debt is the accumulation of missing interpretive labor. It’s extremely natural for young ideas to go through a stage of debt, like early prototypes in engineering.

That is because our value system doesn’t respect interpretors…

170

Modified:
Tags:
#git

#git https://mergebase.com/doing-git-wrong/2018/03/07/fun-with-git-pull-rebase/

Some people claim “git fetch; git rebase origin/master” is equivalent to “git pull -r”, but it isn’t.

git pull -r also deals with squashes.

#TIL

Rebase hell happens when several commits on your branch edit the same area, and upstream also touched the same area. The problem occurs because each conflict resolution will itself conflict with the subsequent commit in the series.

169

Published:
Tags:
#TIL

#TIL

https://www.reddit.com/r/todayilearned/comments/l4y427/til_larry_hillblom_the_h_of_dhl_regularly_took/

TIL Larry Hillblom, the H of DHL, regularly took “sex safari” trips to Asia to prey on underage girls. When he died in a plane crash, 4 of the illegitimate children he fathered were able to claim $50 million each from his estate.

168

Published:
Tags:
#fun

#fun

The authors have got too many questions regarding Chinese translations….

Ref: https://www.deeplearningbook.org/

167

Published:

介绍了一种基于粗略分类和顺序编号来整理数字内容的方法。分类的部分跟我目前整理文件的方式很类似,编号的做法则给我带来一些启发:令人联想到跟政府打交道时使用的一些表格,比如 I-140、1099-B 等。看似是随意的字符序列,但所有熟悉话题的人都明确它们是什么。

https://johnnydecimal.com/

166

Modified:
Tags:
#til

#til https://youtu.be/3C2HVOB-g5s

So, volcanoes are very complicated.

165

Published:
Tags:
#statistics

#statistics

https://en.wikipedia.org/wiki/Dependent_and_independent_variables#Statistics_synonyms

The broken jargon system in statistics …😱

Depending on the context, an independent variable is sometimes called a “predictor variable”, regressor, covariate, “controlled variable”, “manipulated variable”, “explanatory variable”, exposure variable (see reliability theory), “risk factor” (see medical statistics), “feature” …

163

Modified:

https://www.youtube.com/watch?v=KXRtNwUju5g

This is one of the hidden problems of our world. In some sense, the US is destroying the world. If you look at Germany, plastic recycling is much easier with all these machines in the stores. (or, is it?)

162

Published:

161

Published:

160

Modified:
Tags:
#ML

#ML

An interesting idea on time series predictions. Instead of predicting the exact time series, the author proposed a method to predict the future using ordinal patterns.

The figure shows how to disintegrate the time series into 8 overlapping short-term series (each with three numbers). To transform the short-term series into patterns, we write down the permutation pattern (for size of the series D=3, we have only 6 possible permutations). Then we will use the permutation patterns in the past …

159

Published:
Tags:
#ML
#ML http://jibencaozuo.com/ PaperClip made a platform for everyone to play with artificial neural networks. My impression: it looks nice. The interactions can be better but I am sure the next iteration will be much better.

158

Modified:
Tags:
#cn

#cn http://www.cddata.gov.cn/oportal/index 成都竟然有开放数据平台,而且做的还不错。

更新: 我发现有别的省市也有,难道是所有城市和省份已经统一了?都有这个开放数据平台?

157

Modified:
Tags:
#TIL

#TIL

https://stackoverflow.com/a/28142831/1477359

I had the same idea that git fast forward merge is more or less the same as rebase. Until I read this stackoverflow answer.

I guess we should always rebase whenever possible to maintain a clean history.

156

Published:
Tags:
#shameless

#shameless

https://tools.kausalflow.com/

In the past years, I have been building a showcase of digital tools for academic researchers. It started with some friends asking for recommendations of tools for reference management, visualization, note-taking, and so on ad infinitum.
So I built a GitHub repo to share what I have learned about these tools. This was way before the “awesome repo” concept. Later came the “GitHub awesome repo” shitstorm. Everyone is building an …

155

Modified:
Tags:
#fun

#fun

hmmm it must have been an endless reviewing process. 😱

153

Modified:
Tags:
#ML

152

Modified:
Tags:
#fun

#fun

Have some sunshine! 银装素裹

151

Modified:
Tags:
#career #business

150

Modified:
Tags:
#business

#business In 2015, there was a company called SixFold. They were one of the first heroes to disrupt an industry that has not changed much for a century, the freight market. They investigated the situation, established their hypothesis, created MVP. They did not succeed. The image is a summary of their post mortem.

There are at least two learning from this story.

  • Think in terms of the utility function. Do not just point out blocks of reasons. Write down the utility function for the situation and …

149

Published:
Tags:
#ML

#ML

https://alan-turing-institute.github.io/skpro/introduction.html#a-motivating-example

Alan Turing Institute created a package called skpro for probabilistic modeling. Unlike many other probabilistic modeling packages, skpro integrates into sklearn pretty well.

148

Modified:

https://www.chicagobooth.edu/why-booth/stories/in-memoriam-phd-student-yiran-fan

A 30-year-old Ph.D. student in a joint program of Chicago Booth and the Kenneth C. Griffin Department of Economics, Fan was shot and killed on Jan. 9.

Related news article: https://www.globaltimes.cn/page/202101/1212449.shtml

Fan was shot and killed in his car in the parking garage at an apartment building at about 1:50 pm Saturday. After shooting and killing Fan, the suspect, identified as 32-year-old Jason …

147

Modified:
Tags:
#ML #paper

146

Modified:
Tags:
#fun

#fun

https://observablehq.com/@mbostock/hertzsprung-russell-diagram

Mike Bostock made a Hertzsprung–Russell Diagram using d3.js. It looks so cool.

145

Modified:
Tags:
#productivity

#productivity

I have been using Obsidian as my primary note-taking app for a while. It was a rough start. Linking notes was simply not in my workflow. In some sense, I am not familiar with my notes after a while. So I started to work on notes reviews every two weeks. On each notes review, I go through my notes inbox and spend some time connecting them with the the existing ones.

This is how my notes look like now. They are mostly well connected. (The cluster is because I have archived them as …

144

Modified:
Tags:
#dev
#dev Analysis of the NoSQL Landscape - All About the Code http://blog.knuthaugen.no/2010/03/the-nosql-landscape.html

142

Published:
补充一部讨论互联网审查的工作生产、全球影响、社交媒体加剧冲突等议题的德国纪录片 The Cleaners(原名 Im Schatten der Netzwelt,网络阴影之下)。Hans Block 和 Moritz Riesewieck 执导,他们也在 TED 上 讲述了 关于「数字清洁」(digital cleaning)的问题。不过我第一次看这部纪录片,是在 DW 的 YouTube 频道,分为 上 、下 两集播出,目前均已失效,原因不明。一部讲述内容被删除的影片,自身却(或被)删除,不论是因著作权还是其他原因都表现出足够的讽刺。Internet Archive 上仍可找到 播出日 的 存档 回看,也可 在此 下载观看。

141

Published:
Twitter suspends Sci-Hub account amid Indian court case - The Verge https://www.theverge.com/2021/1/8/22220738/twitter-sci-hub-suspended-indian-court-case

140

Modified:
Tags:
#intelligence #paper #ML

#intelligence #paper #ML Superintelligence Cannot be Contained: Lessons from Computability Theory https://www.jair.org/index.php/jair/article/view/12202

We argue that total containment is, in principle, impossible, due to fundamental limits inherent to computing itself. Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires …

139

Modified:
Tags:
#machinelearning

#machinelearning

A nice colloquium paper: The unreasonable effectiveness of deep learning in artificial intelligence | PNAS https://www.pnas.org/content/117/48/30033

138

Modified:
Tags:
#intelligence

#intelligence

https://www.economicprinciples.org/

How the economic machine works make easy

137

Published:

136

Published:

135

Modified:

https://github.com/volotat/DiffMorph #machinelearning #opensource

Differentiable Morphing

Image morphing without reference points by applying warp maps and optimizing over them.

134

Published:
Tags:
#neuroscience

#neuroscience

Source: https://science.sciencemag.org/content/370/6523/1410.full A gatekeeper for learning

Upon learning a hippocampus-dependent associative task, perirhinal inputs might act as a gate to modulate the excitability of apical dendrites and the impact of the feedback stream on layer 5 pyramidal neurons of the primary somatosensory cortex.

😲 In some sense, perirhinal inputs are like config files for learning.

133

Modified:
Tags:
#data

#data

Could you prevent a pandemic? A very 2020 video game https://play.acast.com/s/nature/2020festivespectacular

132

Modified:
what the f ECDC has been like this for at least 2h. Am I time travelling back to 2000? AND this is about disease prevention and controll not useless blogs…

131

Published:

https://www.nature.com/articles/s41557-020-0544-y

Here we propose PauliNet, a deep-learning wavefunction ansatz that achieves nearly exact solutions of the electronic Schrödinger equation for molecules with up to 30 electrons

129

Modified:
Tags:
#data #covid19

#data #covid19

UK gov has an official covid 19 API. https://coronavirus.data.gov.uk/details/developers-guide#structure-metrics

I found this funny typo in the documentation. 😂 The first one should be cumCasesByPublishDateRate.

128

Published:
Tags:
#showerthoughts
#showerthoughts As human beings, we read or hear about facts of something. These are our priors. Our belief is then updated based on observation of data, aka, likelihood. Some people abide by the priors, they are the prior-people, while others are more like likelihood-people and easily change their belief based on observations. There is a third type. They combine priors and likelihood. Change belief based on likelihood is prone to biases in data. By combining priors and likelihood, they have a …

127

Published:
Tags:
#datascience

#datascience

I ran into this hilarious comment on pie chart in a book called The Grammar of Graphics.

“To prevent bias, give the child the knife and someone else the first choice of slices.” 😱😱😱

126

Published:

125

Modified:
Tags:
#tools #writing

#tools #writing https://www.losethevery.com/

“Very good english” is not very good english. Lose the very.

124

Modified:
Tags:
#datascience #career #academia

#datascience #career #academia

I regret quitting astrophysics

https://news.ycombinator.com/item?id=25444069

http://www.marcelhaas.com/index.php/2020/12/16/i-regret-quitting-astrophysics/

me too 😂 though not an astrophysicist, I miss academia too

123

Modified:
Tags:
#datascience #audliolization
#datascience #audliolization This is the audiolization of the daily new cases for FR, IT, ES, DE, PL between 2020-08-01 and 2020-12-14. I made an audiolization video two years ago. As I am currently under quarantine and the days are becoming so boring, I started to think about the mapping of data points to different representations. We usually talk about visualization because there are so many elements to be used to represent complicated data. Audiolization, on the other hand, leaves us with …

122

Modified:
Tags:
#fun
#fun https://youtu.be/-QiM9NUow3c. Have some fun

121

Modified:
Tags:
#machinelearning
#machinelearning

120

Published:

119

Modified:
Tags:
#fun
#fun haha no crowd 👍

118

Modified:
Tags:
#tools

#tools Space: The Integrated Team Environment https://www.jetbrains.com/space/

Wow, I love jetbrains.

117

Modified:
Tags:
#machinelearning

#machinelearning https://arxiv.org/abs/2007.04504 Learning Differential Equations that are Easy to Solve

Jacob Kelly, Jesse Bettencourt, Matthew James Johnson, David Duvenaud

Differential equations parameterized by neural networks become expensive to solve numerically as training progresses. We propose a remedy that encourages learned dynamics to be easier to solve. Specifically, we introduce a differentiable surrogate for the time cost of standard numerical solvers, using higher-order …

116

Modified:
Tags:
#science

#science The ergodicity problem in economics | Nature Physics https://www.nature.com/articles/s41567-019-0732-0

I read another paper about hot hand/gamblers’ fallacy a while ago and the author of that paper took a similar view. Here is the article: Surprised by the Hot Hand Fallacy ? A Truth in the Law of Small Numbers by Miller

115

Modified:
Tags:
#ML

#ML

https://arxiv.org/abs/2012.04863

Skillearn: Machine Learning Inspired by Humans’ Learning Skills

Interesting idea. I didn’t know interleaving is already being used in ML.

114

Published:

113

Modified:
Interesting idea. The milk box as an ad platform 😱

112

Published:

https://events.ccc.de/2020/09/04/rc3-remote-chaos-experience/

CCC is hosting the event for 2020 fully online. Everyone can join with a pay-as-you-wish ticket. Join if you like programming, hacking, social events, learning something crazy and new. 👍👍👍

110

Published:
Naturally Occurring Equivariance in Neural Networks https://distill.pub/2020/circuits/equivariance

109

Modified:
Tags:
#ML #paper

#ML #paper

https://arxiv.org/abs/2012.00152 Every Model Learned by Gradient Descent Is Approximately a Kernel Machine Deep learning’s successes are often attributed to its ability to automatically discover new representations of the data, rather than relying on handcrafted features like other learning methods.

108

Modified:

A new search engine by a former chief scientist who helped developing the AI platform Einstein for Salesforce.

The new search engine is called “you”.

https://you.com/?refCode=5ac0f0ea

107

Published:
TachibanaYoshino/AnimeGAN: A Tensorflow implementation of AnimeGAN for fast photo animation ! This is the Open source of the paper 「AnimeGAN: a novel lightweight GAN for photo animation」, which uses the GAN framwork to transform real-world photos into anime images. https://github.com/TachibanaYoshino/AnimeGAN

106

Published:

105

Modified:
If you live in Germany, here is a tip that might be useful: The VAT is getting back to 19% in the next year.

103

Published:

https://www.pnas.org/content/early/2020/12/02/2015954117

Oh Hi, it’s you, Mask. Or social distancing?

102

Published:

101

Published:
Cellular ageing: turning back the clock restores vision in mice https://play.acast.com/s/nature/cellularageing-turningbacktheclockrestoresvisioninmice

100

Published:
https://github.com/porn-vault/porn-vault Manage your ever-growing porn collection. Using Vue & GraphQL

99

Published:
alicex2020/Chinese-Landscape-Painting-Dataset: Dataset used for WACV 2021 paper: “End-to-End Chinese Landscape Painting Creation Using Generative Adversarial Networks” https://github.com/alicex2020/Chinese-Landscape-Painting-Dataset

98

Published:
Guess what, 115.de, a website being used for government related services in Cologne, requires Adobe Flash to run some contents on the webpage. 😧

97

Modified:
Stadt Köln advertises some interesting ideas sometimes.

96

Modified:
Sci-fi writers can surely borrow some of these https://youtu.be/MIbFvK2S9g8

95

Published:
Abandon Statistical Significance.pdf

94

Published:

93

Published:
Legendary Arecibo telescope will close forever — scientists are reeling https://www.nature.com/articles/d41586-020-03270-9

92

Published:

91

Published:
What is e-Residency | How to Start an EU Company Online https://e-resident.gov.ee/

90

Modified:

I did some investigation on the salary of tech employees working for Cologne city. It seems that the salary for IT employees are quite low. This may not be a fair representation of the whole Germany. But Cologne is one of the most digitized cities in Germany. So I would guess it should be a fair example.

For example, a data manager is among the salary group 11 (net from 2144EUR to 2993EUR)

This is the job description: …

89

Modified:

Does Apple really log every app you run? A technical look

https://blog.jacopo.io/en/post/apple-ocsp/

TL;DR

  • No, macOS does not send Apple a hash of your apps each time you run them. You should be aware that macOS might transmit some opaque information about the developer certificate of the apps you run. This information is sent out in clear text on your network.
  • You shouldn’t probably block ocsp.apple.com with Little Snitch or in your hosts file.

88

Published:

Going from Bad to Worse: From Internet Voting to Blockchain Voting

This article examines the suggestions that “voting over the Internet” or “voting on the blockchain”would increase election security, and finds such claims to be wanting and misleading. While currentelection systems are far from perfect, Internet- and blockchain-based voting would greatly increase therisk of undetectable, nation-scale election failures

https://people.csail.mit.edu/rivest/pubs/PSNR20.pdf

87

Published:

https://thegradient.pub/how-can-we-improve-peer-review-in-nlp/

(Anderson, 2009) argues that research paper merit is Zipf-distributed: many papers are clear rejects, while a few are clear accepts. In between those two extremes, decisions are very difficult, and any differences between the best rejected and the worst accepted paper are tiny, even given the best possible set of reviewers.

(And the following is quite discriminating towards non english speaking researchers.)

Work not-on-English: …

86

Modified:

Wow what is gonna happen to microsoft

https://twitter.com/gvanrossum/status/1326932991566700549

Guido van Rossum @gvanrossum I decided that retirement was boring and have joined the Developer Division at Microsoft. To do what? Too many options to say! But it’ll make using Python better for sure (and not just on Windows :-). There’s lots of open source here. Watch this space.

85

Modified:

I just finished the book Grokking Algorithms last night. https://www.manning.com/books/grokking-algorithms

I think it is a well-written book for people who is not from a CS background. The book has a lot of examples showing how the algorithms work step by step. To me, the most interesting chapter is dynamic programming. I had a lot of fun reading this. Highly recommended if you are interested in algorithms!

84

Published:
Flavorfox - flavor pairing search engine https://www.flavorfox.app/en

83

Published:

https://www.frontiersin.org/articles/10.3389/fncom.2012.00094/full

An emerging consensus for open evaluation: 18 visions for the future of scientific publishing

82

Published:
REVEALED: How much Uber pays its employees, from software engineers to data analysts (UBER) https://www.businessinsider.com/uber-employees-make-pay-salary-software-engineer-data-analyst-2020-11

80

Published:
SQL in 200 years?

79

Modified:
This reminds me of an old man I met on a train. I reminded this old man, nicely, to put his mask on. He put it on but gave me a strange angry look. In a few seconds, he then stormed to the other side of the train, took his mask off, and started to act like he was suffocating. It was so weird. It would be so easy to communicate instead of acting like a child and trying to put some blame on other irrelevant people.

78

Modified:

https://www.nytimes.com/interactive/2020/11/03/us/elections/forecast-president.html

Comment Am Neumarkt: The best information designers are summoned on each election day. It is a good time to learn about the best practices of data visualization. This “paths to victory” visualization is one of the best I have ever seen. If the put some probabilities on each branch, it becomes a transitional decision tree to estimate risks used by investors. Does it tell us anything useful directly? Not really. …

76

Published:

75

Published:
Presidential Election 2020: Live Results and Analysis | The New Yorker https://www.newyorker.com/news/election-2020/live-2020-presidential-election-results

74

Published:
Endnote Click | Tools for Academic Research | KausalFlow https://tools.kausalflow.com/tools/endnote-click/

73

Published:
Frequentism and Bayesianism: A Practical Introduction | Pythonic Perambulations http://jakevdp.github.io/blog/2014/03/11/frequentism-and-bayesianism-a-practical-intro/

72

Modified:
The story of wubi

71

Modified:

Urban Dictionary Embeddings for Slang NLP Applications - ACL Anthology https://www.aclweb.org/anthology/2020.lrec-1.586

Haha very cool Water <-> butt-splash Soda <-> sodagasm

70

Published:

69

Modified:

https://www.nature.com/articles/d41586-020-02986-y

Muotri, a neuroscientist at the University of California, San Diego (UCSD), has found some unusual ways to deploy his. He has connected organoids to walking robots, modified their genomes with Neanderthal genes, launched them into orbit aboard the International Space Station, and used them as models to develop more human-like artificial-intelligence systems.

68

Modified:

The effect of influenza vaccination on trained immunity: impact on COVID-19 | medRxiv https://www.medrxiv.org/content/10.1101/2020.10.14.20212498v1

Hospital workers who got vaccinated were significantly less likely to develop COVID than those who did not

I believe that is just a simple sampling problem. People had flu shot this year because they’re really careful about infectious diseases. They maybe also sanitize more.

67

Published:

Media

Explorer | Explore Human Knowledge Explorer by Batou. Navigate wikipedia visually!. Product topic: Web App, User Experience, Education, Artificial Intelligence, Tech View on Product Hunt

66

Modified:
Hmmmm github disabled the YouTube download project

64

Published:

Media

StellarX Create collaborative spaces & rich simulations without code. Product topic: Virtual Reality, Design Tools, Education, Artificial Intelligence, Augmented Reality, Tech View on Product Hunt

63

Published:
Hi, I just created the group Am Neumarkt on Cappuccino the audio messaging app with music. To join me, download the app on http://capp.fm and use this private invite code: 618848.

62

Modified:

https://arxiv.org/abs/2010.06119

ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis

To assist human review process, we build a novel ReviewRobot to automatically assign a review score and write comments for multiple categories. A good review needs to be knowledgeable, namely that the comments should be constructive and informative to help improve the paper; and explainable by providing detailed evidence. ReviewRobot achieves these goals via three steps: (1) We perform …

61

Published:

60

Modified:
Better decisions through science: Math-based aids for making decisions in medicine and industry could improve many diagnoses — offen saving lives in the process

59

Published:

58

Published:
AI Knowledge Map: how to classify AI technologies | by Francesco Corea | Medium https://medium.com/@Francesco_AI/ai-knowledge-map-how-to-classify-ai-technologies-6c073b969020

57

Modified:

Jackson, D. E., & Ratnieks, F. L. W. (2006). Communication in ants. Current Biology, 16(15), R570–R574. https://doi.org/10.1016/j.cub.2006.07.015

I just realized that what we have been calling swarm intelligence is not very different from our single agent intelligence. They are all dealing with information diffusion. Using the diffusion, swarm intelligence shares the global information with dumb agents. Our brain, on the other hand, is using information diffusion (using Ca as an agent) as a …

56

Published:

https://www.nobelprize.org/prizes/physics/1972/cooper/biographical/

I just learned today that the Cooper in BCS for superconductivity and the Cooper in BCM in neuroscience are the same Cooper. This guy is amazing.

54

Published:
Phys. Rev. Lett. 125, 161802 (2020) - Neutrino Self-Interactions and XENON1T Electron Recoil Excess https://link.aps.org/doi/10.1103/PhysRevLett.125.161802

53

Published:

52

Published:
这次美国大选一个关键数据是拉丁裔的合格选民数量超越了黑人,将显著改变大选结果。在所有少数族裔里,拉丁裔的右倾程度是最严重的,2016年近三成拉丁裔选民都投了川普,今年尽管经历反种族主义抗议,拉丁裔的保守化程度并未得到根本性逆转,甚至可能被反向加强。Proud Boys目前的总负责人Enrique Tarrio就是古巴黑人,也是Latinos for Trump的核心成员。传统上人们认为拉丁裔的保守主义主要由古巴反共移民和福音派所塑造,但最近的分析发现实际情况要复杂许多。西北大学的Geraldo Cadava今年初出版The Hispanic Republican一书,扭转了很多对拉丁裔选民的刻板印象,比如拉丁裔保守派是一个非常多元、稳定的选票集团。自尼克松到现在,拉丁裔的共和党支持者数量一直都保持在三分之一,在小布什任下甚至一度涨到了四成。而相比古巴裔,墨西哥和波多黎各裔对拉丁裔共和党组织的贡献更卓著。Cadava这周还在哥大历史系的活动上宣传了自己的新书,可以在这看活动视 …

51

Published:
Well, if you understand how a dice works, this is so easy to explain. The zero-order solution has nothing to do with the materials or detailed physics. The edge is roughly representing four sides of a dice…. We could actually design a fair cylindrical dice by adjusting the ratio of diameter and the edge width if we want…

48

Published:

https://onlinelibrary.wiley.com/doi/full/10.1111/1475-6773.13553

“The intervention placed and retained frequent user, chronically homeless individuals in housing. It decreased psychiatric ED visits and shelter use, and increased outpatient mental health care, but not medical ED visits or hospitalizations. Limitations included more than one‐third of usual care participants received another form of subsidized housing, potentially biasing results to the null, and loss of power due to high …

47

Published:

46

Published:
Signatures of a liquid–liquid transition in an ab initio deep neural network model for water | PNAS https://www.pnas.org/content/early/2020/10/01/2015440117

45

Published:

44

Modified:
What a boring Friday. I started to experiment on a plastic ring to find out the probabilities of it standing on the edges when dropped from heights. The diameter of this ring is much larger than its width (diameter/width ~ 2). In the beginning, the result was trivial. Then the results started to look weird to me. The probability of it standing on its edges is higher than falling flat down. Is this an angular momentum problem? (Why didn’t I continue with the experiment? I stepped on it and …

43

Modified:

https://greenelab.github.io/scihub-manuscript/v/8fcd0cd665f6fb5f39bed7e26b940aa27d4770ba/

I am a lit bit scared whenever I think about how it accesses the papers. It is a black box and we have no idea if scihub is doing this in a way that is accepted by every researcher. On the other hand, it is not easy to live without scihub. There are legal alternatives like unpaywall and kopernio but they are way behind the game. What shall we do? Require the author of scihub to open source the code? …

42

Published:

41

Published:
Hypergraph is a decentralized tool to help researchers manage their work. Everything in your research, from notes to data to publications to proposals , is linked together for better discovery and reproducibility.

40

Published:

https://www.libscie.org/

hypergraph is out of alpha.

37

Published:

https://stackoverflow.com/questions/16047306/how-is-docker-different-from-a-virtual-machine/36368012#36368012

“Docker for Mac uses https://github.com/moby/hyperkit to emulate the hypervisor capabilities and Hyperkit uses hypervisor.framework in its core. Hypervisor.framework is Mac’s native hypervisor solution. Hyperkit also uses VPNKit and DataKit to namespace network and filesystem respectively.”

hmmm I guess this is why docker on mac uses a lot of resources compared to its …

36

Published:

35

Published:

34

Modified:
Synopsis: No Sterile Neutrinos from Eight Years of IceCube http://link.aps.org/doi/10.1103/Physics.13.s126

33

Modified:
Disney will lay off 28,000 theme park workers as the pandemic continues to ravage its business (DIS) https://www.businessinsider.com/disney-layoffs-theme-parks-disneyland-disneyworld-2020-9

32

Modified:

31

Modified:
Maternal microbes support fetal brain wiring https://www.nature.com/articles/d41586-020-02657-y

30

Published:

29

Modified:
Gangster capitalism and the American theft of Chinese innovation – TechCrunch https://techcrunch.com/2020/09/20/gangster-capitalism-and-the-american-theft-of-chinese-innovation/

28

Modified:
Nature Index’s top five science cities, by the numbers https://www.nature.com/articles/d41586-020-02576-y

27

Modified:

https://arxiv.org/abs/2005.12505

Very interesting research on voting mechanism. They built a theory to understand how the freemason member selection procedures shape the community. The Freemason only integrates a member if the member is accepted by all of the current members.

26

Modified:
Topology of Deep Neural Networks http://jmlr.org/papers/v21/20-345.html

25

Modified:

24

Modified:
Scikit-network: Graph Analysis in Python http://jmlr.org/papers/v21/20-412.html

23

Modified:

22

Modified:
What Germany teaches the world in a crisis https://www.ft.com/content/a5d516b5-eff8-4f6f-841f-1fed303e4395

21

Modified:
In summary, the Panel concluded that the United States had not provided an explanation demonstrating how the imposition of additional duties on the selected imported products in List 1 and List 2 was apt to contribute to the public morals objective invoked, and, following on from that, how they were necessary to protect public morals. The Panel found, accordingly, that the United States had not met its burden of demonstrating that the measures are provisionally justified under Article XX(a).

20

Modified:

19

Modified:

17

Published:

16

Published:

15

Published:

https://distill.pub/2020/communicating-with-interactive-articles/

details-on demand is so popular and crucial to perception.

14

Published:
More than 100 scientific journals have disappeared from the Internet https://www.nature.com/articles/d41586-020-02610-z

13

Published:

11

Published:

Media

Bifrost Data Search Find the perfect image datasets for your next ML project. Product topic: Analytics, Robots, Developer Tools, Artificial Intelligence, Tech, Maker Tools View on Product Hunt

10

Published:

Media

Orchest An open source tool for creating data science pipelines. Product topic: Productivity, Open Source, Developer Tools, Tech View on Product Hunt

9

Published:

8

Published:

6

Published:

Media

Stat of the day Fascinating and important stats from the rest of world. Product topic: News, User Experience View on Product Hunt

5

Published:
Quickprop: an almost forgotten neural training algorithm – Giuseppe Bonaccorso https://www.bonaccorso.eu/2017/09/15/quickprop-an-almost-forgotten-neural-training-algorithm/

4

Published:

Media

Paletro Enable command palette (⇧⌘P) in any application on macOS. Product topic: Mac, Productivity View on Product Hunt