Spark Scala: How to transform a column in a DF

Spark Scala: How to transform a column in a DF

Asked on January 11, 2019 in Apache-spark.
Add Comment


  • 1 Answer(s)

    There are two ways to solve the issue by using the function.

    • Using map / toDF
    • Using UDFs (UserDefinedFunction)

    In this method map / toDF function is used:

    import org.apache.spark.sql.Row
    import sqlContext.implicits._
     
    def getTimestamp: (String => java.sql.Timestamp) = // your function here
     
    val test = myDF.select("my_column").rdd.map {
      case Row(string_val: String) => (string_val, getTimestamp(string_val))
    }.toDF("my_column", "new_column")
    

    Another method is by using UDFs (UserDefinedFunction):

    import org.apache.spark.sql.functions._
     
    def getTimestamp: (String => java.sql.Timestamp) = // your function here
     
    val newCol = udf(getTimestamp).apply(col("my_column")) // creates the new column
    val test = myDF.withColumn("new_column", newCol) // adds the new column to original DF
    
    Answered on January 11, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.