Best way to get the max value in a Spark dataframe column

Best way to get the max value in a Spark dataframe column

Asked on December 17, 2018 in Apache-spark.
Add Comment


  • 3 Answer(s)

    Here the “asDict()” can be removed.

    >df1.show()
    +-----+--------------------+--------+----------+-----------+
    |floor|           timestamp|     uid|         x|          y|
    +-----+--------------------+--------+----------+-----------+
    |    1|2014-07-19T16:00:...|600dfbe2| 103.79211|71.50419418|
    |    1|2014-07-19T16:00:...|5e7b40e1| 110.33613|100.6828393|
    |    1|2014-07-19T16:00:...|285d22e4|110.066315|86.48873585|
    |    1|2014-07-19T16:00:...|74d917a1| 103.78499|71.45633073|
     
    >row1 = df1.agg({"x": "max"}).collect()[0]
    >print row1
    Row(max(x)=110.33613)
    >print row1["max(x)"]
    110.33613
    
    Answered on December 17, 2018.
    Add Comment

    Here the max value for a particular column of a dataframe can be achieved by using –

    your_max_value = df.agg({"your-column": "max"}).collect()[0][0]
    

     

     

    Answered on December 17, 2018.
    Add Comment

    Here the Scala is used. (using Spark 2.0.+)

    scala> df.createOrReplaceTempView("TEMP_DF")
    scala> val myMax = spark.sql("SELECT MAX(x) as maxval FROM TEMP_DF").
        collect()(0).getInt(0)
    scala> print(myMax)
    117
    
    Answered on December 17, 2018.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.