简体   繁体   English

Scala:将火花 DataFrame 中的值添加到 for 循环中的可变列表

[英]Scala: adding value from spark DataFrame to mutable list in a for loop

I want to update the elements of a MutableList that was declared outside of a for loop with values from a dataframe.我想用来自 dataframe 的值更新在 for 循环之外声明的 MutableList 的元素。 I initialized the list as empty and expect the list to have the n number of elements added when the loop terminates.我将列表初始化为空,并希望列表在循环终止时添加 n 个元素。 However, it seems only one element is back to an empty list (never gets updated with new additions) and when the loop terminates, the list is back to empty.然而,似乎只有一个元素回到了一个空列表(永远不会用新添加的内容更新),当循环终止时,列表又回到了空列表。

This only happens if I am iterating over a dataFrame, if I iterate over a fied range, say 1-10, the results returned are as expected.这只发生在我迭代 dataFrame 时,如果我迭代一个固定的范围,比如 1-10,返回的结果与预期的一样。

Iterating through dataframe:遍历 dataframe:

val my_list = MutableList[String]()

scala> for (i <-df){
     | my_list += "ok"
     | println(my_list)
     | }
MutableList(ok)
MutableList(ok)

scala> valid_list
res120: scala.collection.mutable.MutableList[String] = MutableList()

Iterating through fixed range遍历固定范围

scala> for (i <- 1 to 10) {
     | my_list += "ok"
     | println (my_list)
     | }
MutableList(ok)
MutableList(ok, ok)
MutableList(ok, ok, ok)
MutableList(ok, ok, ok, ok)
MutableList(ok, ok, ok, ok, ok)
MutableList(ok, ok, ok, ok, ok, ok)
MutableList(ok, ok, ok, ok, ok, ok, ok)
MutableList(ok, ok, ok, ok, ok, ok, ok, ok)
MutableList(ok, ok, ok, ok, ok, ok, ok, ok, ok)
MutableList(ok, ok, ok, ok, ok, ok, ok, ok, ok, ok)

scala> my_list
res122: scala.collection.mutable.MutableList[String] = MutableList(ok, ok, ok, ok, ok, ok, ok, ok, ok, ok)

Also open to alternative methods of generating a list from df elements.还可以使用从 df 元素生成列表的替代方法。

I will try to answer here.我会在这里尝试回答。 Is this what you want to achieve这是你想要达到的目标吗

val listValueDataFrame = Seq(("one", 2.0),("two", 1.5),("three", 8.0)).toDF("id", "val")
listValueDataFrame.printSchema
val listOfIds = listValueDataFrame.select("id").collect().map(_(0)).toList
val someOtherDataFrame = Seq(("one", 3.0),("ni", 2.5),("san", 9.0)).toDF("id", "val")
someOtherDataFrame.filter(someOtherDataFrame("id").isin(listOfIds:_*)).show

on my execution this prints the following在我执行时,这会打印以下内容

root
 |-- id: string (nullable = true)
 |-- val: double (nullable = false)

+---+---+
| id|val|
+---+---+
|one|3.0|
+---+---+

listValueDataFrame: org.apache.spark.sql.DataFrame = [id: string, val: double]
listOfIds: List[Any] = List(one, two, three)
someOtherDataFrame: org.apache.spark.sql.DataFrame = [id: string, val: double]

Does this help at all, was not 100% sure i understood the complete context of the question, but this can be achieved this way.这有帮助吗,不是 100% 确定我理解了问题的完整背景,但这可以通过这种方式实现。 Note that i have used collect and with large number of records this will cause bad performance (data will have to be "collected" and moved to the driver)请注意,我使用了 collect 并且有大量记录,这将导致性能不佳(必须“收集”数据并将其移至驱动程序)

Can you try like this,可以这样试试吗

for (i <-df.collect){
      my_list += "ok"
      println(my_list)
      }

i used scala.collection.mutable.listBuffer and it worked fine.我使用了 scala.collection.mutable.listBuffer 并且效果很好。

scala> val a = scala.collection.mutable.ListBuffer[String]()
a: scala.collection.mutable.ListBuffer[String] = ListBuffer()

scala> for ( i <- df.collect) {a+="ok"; println(a)}
ListBuffer(ok)
ListBuffer(ok, ok)
ListBuffer(ok, ok, ok)
ListBuffer(ok, ok, ok, ok)
ListBuffer(ok, ok, ok, ok, ok)
ListBuffer(ok, ok, ok, ok, ok, ok)
ListBuffer(ok, ok, ok, ok, ok, ok, ok)
ListBuffer(ok, ok, ok, ok, ok, ok, ok, ok)

scala> a
res11: scala.collection.mutable.ListBuffer[String] = ListBuffer(ok, ok, ok, ok, ok, ok, ok, ok)

Please try and let me know the result.请尝试让我知道结果。 Cheers....干杯....

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM