We should be careful when dealing with python mutable objects. For example, make copies of python mutable objects in pyspark udfs.
To compare two dataframe schemas in [[PySpark]] Data Processing - (Py)Spark Processing Data using …
Pitfals of timezone conversion in Postgres
Deal with rare categories using pandas
Join tables together starting with the smallest table (table with less cardinality) speeds things …
Please deal with null carefully.
Meta tables are very useful when it comes to get bigquery table information programmatically.
Snippet for calculating moving avg using sql/biguqery
Generate a table with a column of continuous dates
BigQuery Current User
Materialize the query result for multistage queries to make your query faster and lower the costs.
Dealing with errors when scraping data