multiple conditions for filter in spark data frames

multiple conditions for filter in spark data frames

Asked on January 8, 2019 in Apache-spark.
Add Comment


  • 3 Answer(s)

    Here this could be the best solution for this problem, Instead of using this:

    df2 = df1.filter("Status=2" || "Status =3")
    

    Try by using this

    df2 = df1.filter($"Status" === 2 || $"Status" === 3)
    
    Answered on January 8, 2019.
    Add Comment

    where and filter methods in Dataset/Dataframe supports two syntaxes: The SQL string parameters:

    df2 = df1.filter(("Status = 2 or Status = 3"))
    

    Parameters based on col

    df2 = df1.filter($"Status" === 2 || $"Status" === 3)
    
    
    Answered on January 8, 2019.
    Add Comment

    Alternatively below filter can be used:

    package dataframe
     
    import org.apache.spark.sql.SparkSession
    /**
    * @author [email protected]
    */
    //
     
    object DataFrameExample{
      //
      case class Employee(id: Integer, name: String, address: String, salary: Double, state: String,zip:Integer)
      //
      def main(args: Array[String]) {
        val spark =
          SparkSession.builder()
            .appName("DataFrame-Basic")
            .master("local[4]")
            .getOrCreate()
     
        import spark.implicits._
     
        // create a sequence of case class objects
     
        // (we defined the case class above)
     
        val emp = Seq(
        Employee(1, "vaquar khan", "111 algoinquin road chicago", 120000.00, "AZ",60173),
        Employee(2, "Firdos Pasha", "1300 algoinquin road chicago", 2500000.00, "IL",50112),
        Employee(3, "Zidan khan", "112 apt abcd timesqure NY", 50000.00, "NY",55490),
        Employee(4, "Anwars khan", "washington dc", 120000.00, "VA",33245),
        Employee(5, "Deepak sharma ", "rolling edows schumburg", 990090.00, "IL",60172),
        Employee(6, "afaq khan", "saeed colony Bhopal", 1000000.00, "AZ",60173)
        )
     
        val employee=spark.sparkContext.parallelize(emp, 4).toDF()
     
          employee.printSchema()
     
        employee.show()
     
     
        employee.select("state", "zip").show()
     
        println("*** use filter() to choose rows")
     
        employee.filter($"state".equalTo("IL")).show()
     
        println("*** multi contidtion in filer || ")
     
        employee.filter($"state".equalTo("IL") || $"state".equalTo("AZ")).show()
     
        println("*** multi contidtion in filer && ")
     
        employee.filter($"state".equalTo("AZ") && $"zip".equalTo("60173")).show()
      }
    }
    
    Answered on January 8, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.