How to convert rdd object to dataframe in spark ?

How to convert rdd object to dataframe in spark ?

Asked on November 15, 2018 in Apache-spark.
Add Comment


  • 3 Answer(s)

    SqlContext is having more number of createDataFrame methods that create a DataFrame given an RDD.

    Assume that one of these will work for your context.

    For instance:

    def createDataFrame(rowRDD: RDD[Row], schema: StructType): DataFrame
    
    

    In this it creates a DataFrame from an RDD containing Rows using the given schema

     

    Answered on November 15, 2018.
    Add Comment

    The RDD[row] is called rdd and this can be used by:

    val sqlContext = new SQLContext(sc)
    import sqlContext.implicits._
    rdd.toDF()
    

     

    Answered on November 15, 2018.
    Add Comment

    If we are having a DataFrame and  there is a need to do some modification on the fields data by converting it to RDD[Row].

    val aRdd = aDF.map(x=>Row(x.getAs[Long]("id"),x.getAs[List[String]]("role").head))
    

    For converting  back to DataFrame from RDD, In this condition we need to define the structure type of the RDD.

    If the datatype was Long then it will become as LongType in structure.

    If String then StringType in structure.

    val aStruct = new StructType(Array(StructField("id",LongType,nullable = true),StructField("role",StringType,nullable = true)))
    

    By using the createDataFrame method RDD is converted to DataFrame.

    val aNamedDF = sqlContext.createDataFrame(aRdd,aStruct)
    
    Answered on November 15, 2018.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.