简体繁体中英

Parallelize collection in spark scala shell

原文 2017-04-19 01:08:52 0 1 scala/ apache-spark

I trying to parallelize the tuple and getting error below. Please let me know that is the error in below syntax

Thank you

1 answers

Method parallelize need a Seq. Each item in the seq will be one record.

def parallelize[T](seq: Seq[T], 
  numSlices: Int = defaultParallelism)
  (implicit arg0: ClassTag[T]): RDD[T]

In your example, you need add a Seq to wrap the Tuple, and in this case the RDD only has ONE record

scala> val rdd = sc.parallelize(Seq(("100", List("5", "-4", "2", "NA", "-1"))))
rdd: org.apache.spark.rdd.RDD[(String, List[String])] = ParallelCollectionRDD[2] at parallelize at <console>:24

scala> rdd.count
res4: Long = 1

How to parallelize Spark scala computation?

Scala fast way to parallelize collection

How can I parallelize a for loop in spark with scala?

Scala shell not recognizing spark

Garbage collection in the Scala shell

Start spark from a scala shell

How to flatten a collection with Spark/Scala?

Load spark scala script into spark shell

Scala case class ignoring import in the Spark shell

Exiting Spark-shell from the scala script

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to parallelize Spark scala computation? Scala fast way to parallelize collection How can I parallelize a for loop in spark with scala? Scala shell not recognizing spark Garbage collection in the Scala shell Start spark from a scala shell How to flatten a collection with Spark/Scala? Load spark scala script into spark shell Scala case class ignoring import in the Spark shell Exiting Spark-shell from the scala script

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM