简体   繁体   中英

“java.lang.UnsupportedOperationException: empty collection”

I'm working with Spark 2.1.1 and Scala 2.11.8

I'm executing my code in Spark-shell. This is the code I'm executing

val read_file1 = sc.textFile("Path to file 1");

val uid = read_file1.map(line => line.split(",")).map(array => array.map(arr => {
 | if(arr.contains(":")) (array(2).split(":")(0), arr.split(":")(0))
 |  else (array(2).split(":")(0), arr)}))

val rdd1 = uid.map(array => array.drop(4)).flatMap(array => array.toSeq).map(y=>(y,1)).reduceByKey(_+_)

My output of this code is :

(( v67430612_serv78i, fb_201906266952256),1)
(( v74005958_serv35i, fb_128431994336303),1)

However for the two RDDs' outputs, when I execute :

uid2.map(x => ((x._1, x._2), x._3)).join(rdd1).map(y => ((y._1._1, y._1._2, y._2._1), y._2._2))

I get the error :

 "java.lang.UnsupportedOperationException: empty collection" 

Why am I getting this error?

Here are samples of the input files:-

File 1 :

2017-05-09 21:52:42 , 1494391962 , p69465323_serv80i:10:450 , 7 , fb_406423006398063:396560, guest_861067032060185_android:671051, fb_100000829486587:186589, fb_100007900293502:407374, fb_172395756592775:649795
2017-05-09 21:52:42 , 1494391962 , z67265107_serv77i:4:45 , 2:Re , fb_106996523208498:110066, fb_274049626104849:86632, fb_111857069377742:69348, fb_127277511127344:46246

File 2 :

fb_100008724660685,302502,-450,v300430479_serv73i:10:450,switchtable,2017-04-30 00:00:00    
fb_190306964768414,147785,-6580,r308423810_serv31i::20,invite,2017-04-30 00:00:00

I just noted this : When I'm executing

rdd1.take(10).foreach(println) or rdd1.first()

I get this message too before the output :

WARN Executor: Managed memory leak detected; size = 39979424 bytes, TID = 11

I don't know if this might have anything to do with the problem??

Another note : this error only occurs when I do

res.first()

for

uid2.map(x => ((x._1, x._2), x._3)).join(rdd1).map(y => ((y._1._1, y._1._2, y._2._1), y._2._2))

On doing

res.take(10).foreach(println)

I don't get any output but no error is returned either.

You forgot to trim the spaces in the tuples created from splitted line so nothing was joined as they didn't match. So when you tried take from an empty rdd , exception was thrown.

You can use following solution. Its working in mine.

val read_file1 = sc.textFile("Path to file 1");

val uid = read_file1.map(line => line.split(",")).map(array => array.map(arr => {
   if(arr.contains(":")) (array(2).split(":")(0).trim, arr.split(":")(0).trim)
    else (array(2).split(":")(0).trim, arr.trim)}))

val rdd1 = uid.map(array => array.drop(4)).flatMap(array => array.toSeq).map(y=>(y,1)).reduceByKey(_+_)


val read_file2 = sc.textFile("Path to File 2");
val uid2 = read_file2.map(line => {var arr = line.split(","); (arr(3).split(":")(0).trim,arr(0).trim,arr(2).trim)});

val res = uid2.map(x => ((x._1, x._2), x._3)).join(rdd1).map(y => ((y._1._1, y._1._2, y._2._1), y._2._2))
res.take(10).foreach(println)

You get an empty collection after the join , it happens when there are now corresponding keys in rdds. Either keys are not trimmed, sliced incorrectly or there were not any matches at all. I suggest checking if there are matching keys in your files/rdds, checking if the data was extracted correctly and checking if you need inner join rather than left or right outer join .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM