TIL

Today I Learned

Introduction: TIL

PySpark: Beware of Python Mutable Objects

Published:
Summary: We should be careful when dealing with python mutable objects. For example, make copies of python mutable objects in pyspark udfs.

PySpark: Compare Two Schemas

Published:
Summary: To compare two dataframe schemas in [[PySpark]] Data Processing - (Py)Spark Processing Data using (Py)Spark , we can utilize the set operations in python. def schema_diff(schema1, schema2): return { 'fields_in_1_not_2': set(schema1) - set(schema2), 'fields_in_2_not_1': set(schema2) - set(schema1) }

VSCode Terminal Python Can Not Activate Conda on Mac

Published:
Tags:
Summary: Enable your key repeat in vscode on mac

VSCode Setup Tests when Module is in a Different Folder

Published:
Tags:
Summary: Use .env file

Managing path using pathlib in Python

Published:
Tags:
Summary: It is a convinient package to manage path and files

Deal with Rare Categories Using Pandas

Published:
References: - pandas.DataFrame.mask
Summary: Deal with rare categories using pandas

Binning Data Values using Pandas

Published:
References: - pandas.cut
Summary: Convert continuous values into bins in pandas

PyTorch: Initialize Parameters

Published:
Summary: We can set the parameters in a for loop. We take some of the initialization methods from Lippe1. To set based on the input dimension of the layer ( [[Initialize Artificial Neural Networks]] Initialize Artificial Neural Networks Initialize a neural network is important for the training and performance. Some initializations simply don't work, some will degrade the performance of the model. We should choose wisely. ) (normalized initialization), for name, param in model.named_parameters(): if name.endswith(".bias"): param.data.fill_(0) else: bound = math.sqrt(6)/math.sqrt(param.shape[0]+param.shape[1]) param.data.uniform_(-bound, bound) or set the parameters based on the input size of each layer for name, param in model.named_parameters(): if name.

Three dots in Python

Published:
Tags:
Summary: Use three dots as placeholder for python empty function

Python Class Sequential Inheritance

Published:
Summary: Sequentially inherit python classes

Ordered Member Functions of a Class in Python

Published:
Summary: Build an ordered list of methods in a python class by adding attributes to member functions

Postgres Optimization in JOIN

Published:
Summary: Join tables together starting with the smallest table (table with less cardinality) speeds things up.

Deal with NULL in Postgres

Published:
Summary: Please deal with null carefully.

Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns

Published:
Summary: Pandas Groupby Does Not Guarantee Unique Content in Groupby Columns, it also considers the datatypes. Dealing with mixed types requires additional attention.

== and is in Python

Published:
Tags:
Summary: == and is are different

Switch statement in Python

Published:
Tags:
Summary: Love switch statement? We can design a switch statement it in python.

Python Tilde Operator

Published:
Tags:
Summary: tilde operator may not work as you expected

Arrays and Dicts in MongoDB

Published:
Tags:
Summary: Array of dictionaries becomes hard to update in MongoDB.