Automatically and Elegantly flatten DataFrame in Spark SQL

Automatically and Elegantly flatten DataFrame in Spark SQL

Asked on January 11, 2019 in Apache-spark.
Add Comment


  • 3 Answer(s)

    Here this could be done by recursive function that creates the select(…) statement by going through the DataFrame.schema.

    An Array[Column] should be returned by recursive function. When the StructType is striked by the function, This will be called by its self and append the returned Array[Column] to its own Array[Column].

    Refer the below code:

    def flattenSchema(schema: StructType, prefix: String = null) : Array[Column] = {
      schema.fields.flatMap(f => {
        val colName = if (prefix == null) f.name else (prefix + "." + f.name)
     
        f.dataType match {
          case st: StructType => flattenSchema(st, colName)
          case _ => Array(col(colName))
        }
      })
    }
    

    This could be used:

    df.select(flattenSchema(df.schema):_*)
    
    Answered on January 11, 2019.
    Add Comment

    With the Pyspark Here nested objects is supported by any level.\

    from pyspark.sql.types import StructType, ArrayType
     
    def flatten(schema, prefix=None):
        fields = []
        for field in schema.fields:
            name = prefix + '.' + field.name if prefix else field.name
            dtype = field.dataType
            if isinstance(dtype, ArrayType):
                dtype = dtype.elementType
     
            if isinstance(dtype, StructType):
                fields += flatten(dtype, prefix=name)
            else:
                fields.append(name)
     
        return fields
     
    df.select(flatten(df.schema)).show()
    
    Answered on January 11, 2019.
    Add Comment

    Here alternatively, Refer the following setps in which SQL is used to select columns as flat.

    • At first get original data-frame schema
    • By browsing schema, Generate SQL string
    • The original data-frame is queried.
    Answered on January 11, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.