在PySpark的数据框中获得价值

Question

I have the below dataframe and I'm trying to get the value 3097 as a int, eg storing it in a python variable to manipulate it, multiply it by another int etc. 我有下面的数据框，我试图获取值3097作为一个整数，例如，将其存储在python变量中以对其进行操作，然后将其乘以另一个int等。

I've managed to get the row, but I don't even now if it's a good way to do it and I still can't have the value as a int. 我已经成功地获得了成功，但是即使这是一个很好的方法，但我现在还是不知道，我仍然无法拥有作为整数的价值。

data.groupBy("card_bank", "failed").count().filter(data["failed"] == "true").collect()

Answer 1

您需要从序列中获取row （用于循环或映射功能），然后根据https://spark.apache.org/docs/1.4.0/api/java/org/apache row.getInt(2) /spark/sql/Row.html 。

Answer 2

Try selecting the value from spark dataframe :- 尝试从spark数据框选择值：-

df =data.groupBy("card_bank", "failed").count().filter(data["failed"] == "true").collect()
value = df.select("count").as[int].collect()

here, value will be a list. 在这里，值将是一个列表。

Answer 3

Get the first record from the Row object using index 0 and get the value using the index "count" 使用索引0从Row对象获取第一条记录，并使用索引“ count”获取值

from pyspark.sql.functions import col
data.groupby("card_bank", "failed").count().filter(col("failed") == "true").collect()[0]["count"]

在PySpark的数据框中获得价值

问题描述

3 个解决方案

解决方案1
2 2019-01-03 10:05:41

解决方案2
1 2019-01-03 10:07:09

解决方案3
1 已采纳 2019-01-03 10:10:59

在PySpark的数据框中获得价值

问题描述

3 个解决方案

解决方案1 2 2019-01-03 10:05:41

解决方案2 1 2019-01-03 10:07:09

解决方案3 1 已采纳 2019-01-03 10:10:59

解决方案1
2 2019-01-03 10:05:41

解决方案2
1 2019-01-03 10:07:09

解决方案3
1 已采纳 2019-01-03 10:10:59