PySpark: Compare Two Schemas

To compare two dataframe schemas in [[PySpark]] Data Processing - (Py)Spark Processing Data using (Py)Spark , we can utilize the set operations in python.

def schema_diff(schema1, schema2):

    return {
        'fields_in_1_not_2': set(schema1) - set(schema2),
        'fields_in_2_not_1': set(schema2) - set(schema1)
    }

Planted: by ;

No backlinks identified. Reference this note using the Note ID til/data/pyspark-schema-comparison.md in other notes to connect them.

Lei Ma (2022). 'PySpark: Compare Two Schemas', Datumorphism, 02 April. Available at: https://datumorphism.leima.is/til/data/pyspark-schema-comparison/.