繁体   English   中英

如何从spark scala中的数据框中查找短语计数?

[英]How to find the phrase count from data frame in spark scala?

如何从数据框中的列中查找字数?

我试图从DF下面的评论栏中找到单词的计数

CustID -  Comments

101    [[Nice one, Nice One,Nice]]

102    [[This was nice, Nice]

这是我试图在上面的用例中实现的代码

val result = DF1.withColumn("Count of comments ",  DF1("Comments")).map(events => (events,1)).reduce

在这里,我无法在元组顶部应用'reduceByKey'函数,只有'reduce'函数列出

这是我想要实现的预期输出

CustID  -   Comments                      -  Count of comments**
101         [[Nice one, Nice One,Nice]]      Nice one 2, Nice 1
102         [[This was nice, Nice]           This was nice 1, Nice

任何人都可以帮助我并提供正确的建议来实现上述输出吗?

请在此处找到解决方案:

源数据修剪大括号后看起来像这样:

+------+----------------------+
|CustID|Comments              |
+------+----------------------+
|101   |Nice one,Nice One,Nice|
|102   |This was nice, Nice   |
+------+----------------------+

代码如下所示:

  def countElments(row: Row): Row =
  {
    val str:String = row.getAs[String]("Comments")
    val list=str.split("\\,").map(_.toLowerCase()).toList
    val newCol=list.groupBy(identity).mapValues(_.size).mkString(",")
    Row.merge(row, Row(newCol))
  }

val rdd=df.rdd.map(row =>countElments(row))
val newSchema=df.schema.add("Count of comments", StringType, true)
val final_df=spark.createDataFrame(rdd, newSchema)
final_df.show(false)

输出如下所示:

+------+----------------------+-----------------------------+
|CustID|Comments              |Count of comments            |
+------+----------------------+-----------------------------+
|101   |Nice one,Nice One,Nice|nice -> 1,nice one -> 2      |
|102   |This was nice, Nice   |this was nice -> 1, nice -> 1|
+------+----------------------+-----------------------------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM