简体   繁体   English

在Spark中将地图转换为mapPartitions

[英]Convert map to mapPartitions in spark

I have a code to analyze the log file using map transformatio n. 我有一个代码,可以使用map transformatio来分析日志文件。 Then the RDD got converted to DF . 然后将RDD转换为DF

val logData = sc.textFile("hdfs://quickstart.cloudera:8020/user/cloudera/syslog.txt")

val logDataDF = logData.map(rec => (rec.split(" ")(0), rec.split(" ")(2), rec.split(" ")(5))).toDF("month", "date", "process")

I would like to know whether I can use mapPartitions in this case instead of map . 我想知道在这种情况下是否可以使用mapPartitions代替map

I don't know what is your use case but you can definitely use mapPartition instead of map . 我不知道您的用例是什么,但是您绝对可以使用mapPartition代替map Below code will return the same logDataDF . 下面的代码将返回相同的logDataDF

val logDataDF = logData.mapPartitions(x => {
  val lst = scala.collection.mutable.ListBuffer[(String, String, String)]()
  while (x.hasNext) {
    val rec = x.next().split(" ")
    lst += ((rec(0), rec(2), rec(5)))
  }
  lst.iterator
}).toDF("month", "date", "process")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM