[英]How to find the phrase count from data frame in spark scala?
如何從數據框中的列中查找字數?
我試圖從DF下面的評論欄中找到單詞的計數
CustID - Comments
101 [[Nice one, Nice One,Nice]]
102 [[This was nice, Nice]
這是我試圖在上面的用例中實現的代碼
val result = DF1.withColumn("Count of comments ", DF1("Comments")).map(events => (events,1)).reduce
在這里,我無法在元組頂部應用'reduceByKey'函數,只有'reduce'函數列出
這是我想要實現的預期輸出
CustID - Comments - Count of comments**
101 [[Nice one, Nice One,Nice]] Nice one 2, Nice 1
102 [[This was nice, Nice] This was nice 1, Nice
任何人都可以幫助我並提供正確的建議來實現上述輸出嗎?
請在此處找到解決方案:
源數據修剪大括號后看起來像這樣:
+------+----------------------+
|CustID|Comments |
+------+----------------------+
|101 |Nice one,Nice One,Nice|
|102 |This was nice, Nice |
+------+----------------------+
代碼如下所示:
def countElments(row: Row): Row =
{
val str:String = row.getAs[String]("Comments")
val list=str.split("\\,").map(_.toLowerCase()).toList
val newCol=list.groupBy(identity).mapValues(_.size).mkString(",")
Row.merge(row, Row(newCol))
}
val rdd=df.rdd.map(row =>countElments(row))
val newSchema=df.schema.add("Count of comments", StringType, true)
val final_df=spark.createDataFrame(rdd, newSchema)
final_df.show(false)
輸出如下所示:
+------+----------------------+-----------------------------+
|CustID|Comments |Count of comments |
+------+----------------------+-----------------------------+
|101 |Nice one,Nice One,Nice|nice -> 1,nice one -> 2 |
|102 |This was nice, Nice |this was nice -> 1, nice -> 1|
+------+----------------------+-----------------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.