Spark Scala list folders in directory

Spark Scala list folders in directory

Add Comment


  • 3 Answer(s)

    It does not have listFiles method in hadoop 1.4,  So here  listStatus is used to get directories. It doesn’t have recursive option but it is easy to manage recursive lookup.

    val fs = FileSystem.get(new Configuration())
    val status = fs.listStatus(new Path(YOUR_HDFS_PATH))
    status.foreach(x=> println(x.getPath))
    
    Answered on January 5, 2019.
    Add Comment

    Alternative solution for this problem, try by using the below code:

      val listStatus = org.apache.hadoop.fs.FileSystem.get(new URI(url), sc.hadoopConfiguration)
    .globStatus(new org.apache.hadoop.fs.Path(url))
      for (urlStatus <- listStatus) {
        println("urlStatus get Path:" + urlStatus.getPath())
    
    Answered on January 5, 2019.
    Add Comment

    In the subdirectory iterator  is created over org.apache.hadoop.fs.LocatedFileStatus.

    val spark = SparkSession.builder().appName("Demo").getOrCreate()
    val path = new Path("enter your directory path")
    val fs:FileSystem = projects.getFileSystem(spark.sparkContext.hadoopConfiguration)
    val it = fs.listLocatedStatus(path)
    
    Answered on January 5, 2019.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.