Spark – load CSV file as DataFrame ?

Spark – load CSV file as DataFrame ?

Asked on November 15, 2018 in Apache-spark.
Add Comment


  • 3 Answer(s)

    Actually the spark-csv is part of core Spark functionality and it does not require a separate library.

    for example:

    df = spark.read.format("csv").option("header", "true").load("csvfile.csv")
    

     

    Answered on November 15, 2018.
    Add Comment

    In the Spark 2.0,  CSV can be read by the following :

    val conf = new SparkConf().setMaster("local[2]").setAppName("my app")
    val sc = new SparkContext(conf)
    val sparkSession = SparkSession.builder
      .config(conf = conf)
      .appName("spark session example")
      .getOrCreate()
     
    val path = "/Users/xxx/Downloads/usermsg.csv"
    val base_df = sparkSession.read.option("header","true").
      csv(path)
    
    Answered on November 15, 2018.
    Add Comment

    Here is the solution for Hadoop is 2.6  and Spark is 1.6 and without “databricks” package.

    import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType};
    import org.apache.spark.sql.Row;
     
    val csv = sc.textFile("/path/to/file.csv")
    val rows = csv.map(line => line.split(",").map(_.trim))
    val header = rows.first
    val data = rows.filter(_(0) != header(0))
    val rdd = data.map(row => Row(row(0),row(1).toInt))
     
    val schema = new StructType()
        .add(StructField("id", StringType, true))
        .add(StructField("val", IntegerType, true))
     
    val df = sqlContext.createDataFrame(rdd, schema)
    

     

    Answered on November 15, 2018.
    Add Comment


  • Your Answer

    By posting your answer, you agree to the privacy policy and terms of service.