# What is the difference between map and flatMap and a good use case for each?

What is the difference between map and flatMap and a good use case for each?

• 3 Answer(s)

This is an best example of the difference, as a spark-shell session:

First, some data – two lines of text:

```val rdd = sc.parallelize(Seq("Roses are red", "Violets are blue")) // lines

rdd.collect

res0: Array[String] = Array("Roses are red", "Violets are blue")
```

In this, map transforms an RDD of length N into another RDD of length N.

For instance, it maps from two lines into two line-lengths:

```rdd.map(_.length).collect
res1: Array[Int] = Array(13, 16)
```

Here flatMap (loosely speaking) converts an RDD of length N into a collection of N collections, then flattens these into a single RDD of results.

```rdd.flatMap(_.split(" ")).collect
res2: Array[String] = Array("Roses", "are", "red", "Violets", "are", "blue")
```

In this we got multiple words per line, and multiple lines, but we end up with a single output array of words

Just to clarify that, flatMapping from a collection of lines to a collection of words looks like:

```["aa bb cc", "", "dd"] => [["aa","bb","cc"],[],["dd"]] => ["aa","bb","cc","dd"]
```

The input and output RDDs will be in different sizes for flatMap.

Here, We approved to use map with our split function, we’d have ended up with nested structures (an RDD of arrays of words, with type RDD[Array[String]]) because we have to have exactly one result per input:

```rdd.map(_.split(" ")).collect
res3: Array[Array[String]] = Array(
Array(Roses, are, red),
Array(Violets, are, blue)
)
```

At the end, one good special case is mapping with a function which might not return an answer, and so returns an Option. And the flatMap to filter is used to out the elements that return None and extract the values from those that return a Some:

```val rdd = sc.parallelize(Seq(1,2,3,4))

def myfn(x: Int): Option[Int] = if (x <= 2) Some(x * 10) else None

rdd.flatMap(myfn).collect

res3: Array[Int] = Array(10,20)
```

If there is a need of difference between RDD.map and RDD.flatMap in Spark, map transforms an RDD of size N to another one of size N .

For example:

```myRDD.map(x => x*2)
```

for instance, if the RDD is composed of Doubles .

While flatMap can transform the RDD into anther one of a different size:

For example:

```myRDD.flatMap(x =>new Seq(2*x,3*x))
```

This will return an RDD of size 2*N or

```myRDD.flatMap(x =>if x<10 new Seq(2*x,3*x) else new Seq(x) )
```

Here the test.md is used as a example:

```➜ spark-1.6.1 cat test.md
This is the first line;
This is the second line;
This is the last line.

scala> val textFile = sc.textFile("test.md")
scala> textFile.map(line => line.split(" ")).count()
res2: Long = 3

scala> textFile.flatMap(line => line.split(" ")).count()
res3: Long = 15

scala> textFile.map(line => line.split(" ")).collect()
res0: Array[Array[String]] = Array(Array(This, is, the, first, line;), Array(This, is, the, second, line;), Array(This, is, the, last, line.))

scala> textFile.flatMap(line => line.split(" ")).collect()
res1: Array[String] = Array(This, is, the, first, line;, This, is, the, second, line;, This, is, the, last, line.)
```

Here map method is used to get the lines of test.md, Then for flatMap method, There will be the number of words.

The map method is same as to flatMap, they are all return a new RDD. map method often to use return a new RDD, flatMap method often to use split words.

• ### Your Answer

By posting your answer, you agree to the privacy policy and terms of service.