Handling Null Values in Spark
Handling null values in Spark can be a real pain If one is not well versed with using the inbuilt spark functions. As part of the ETL Logic, we were massaging the data on S3 which was in Parquet format. However, while loading the data into RedShift it was failing as null values were unable to insert into char and number fields of the RedShift schema. Hence, we had to again massage the data to first convert char null values to blank values. Below is the snapshot of the original data which had null values in V_DQ_SEVERITY column which was char datatype and V_DEFAULT_COUNT which was of number datatype. V_DQ_SEVERITY N_DEFAULT_COUNT null 0 null 0 E null E null E null Below is the code snippet of first convert all null values to blank spaces. First read the parquet file using Spark val de