PySpark: Compare Two Schemas

To compare two dataframe schemas in [[PySpark]] Data Processing - (Py)Spark Processing Data using (Py)Spark , we can utilize the set operations in python.

def schema_diff(schema1, schema2):

    return {
        'fields_in_1_not_2': set(schema1) - set(schema2),
        'fields_in_2_not_1': set(schema2) - set(schema1)
    }

Planted: by ;

Lei Ma (2022). 'PySpark: Compare Two Schemas', Datumorphism, 02 April. Available at: https://datumorphism.leima.is/til/data/pyspark-schema-comparison/.