Save Spark dataframe as dynamic partitioned table in Hive

Save Spark dataframe as dynamic partitioned table in Hive

Asked on January 7, 2019 in Apache-spark.
Add Comment


  • 4 Answer(s)

    This issuse can be solved by using the below

    Here df is a dataframe with year, month and other columns

    df.write.partitionBy('year', 'month').saveAsTable(...)
    

    Or else

    df.write.partitionBy('year', 'month').insertInto(...)
    
    
    Answered on January 7, 2019.
    Add Comment

    In this it is done by partitioned hive table, This is done by using  df.write().mode(SaveMode.Append).partitionBy(“colname”).saveAsTable(“Table”)

    This works by activating the following properties:

    hiveContext.setConf("hive.exec.dynamic.partition", "true")
    hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
    
    Answered on January 7, 2019.
    Add Comment

    Alternatively this problem can be solved by using the following.

    The partitioned column become case sensitive when we make any table as partitioned.

    With same name (case sensitive) partitioned column should be present in DataFrame.

    var dbName="your database name"
    var finaltable="your table name"
     
    // First check if table is available or not..
    if (sparkSession.sql("show tables in " + dbName).filter("tableName='" +finaltable + "'").collect().length == 0) {
        //If table is not available then it will create for you..
        println("Table Not Present \n Creating table " + finaltable)
        sparkSession.sql("use Database_Name")
        sparkSession.sql("SET hive.exec.dynamic.partition = true")
        sparkSession.sql("SET hive.exec.dynamic.partition.mode = nonstrict ")
        sparkSession.sql("SET hive.exec.max.dynamic.partitions.pernode = 400")
        sparkSession.sql("create table " + dbName +"." + finaltable + "(EMP_ID string,EMP_Name string,EMP_Address string,EMP_Salary bigint) PARTITIONED BY (EMP_DEP STRING)")
        //Table is created now insert the DataFrame in append Mode
        df.write.mode(SaveMode.Append).insertInto(empDB + "." + finaltable)
    }
    
    Answered on January 7, 2019.
    Add Comment

    df is a dataframe with year, month and other columns

    df.write.partitionBy('year', 'month').saveAsTable(...)
    

    or

    df.write.partitionBy('year', 'month').insertInto(…)
    Answered on March 5, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.