简体   繁体   English

在PySpark的数据框中获得价值

[英]Getting value in a dataframe in PySpark

I have the below dataframe and I'm trying to get the value 3097 as a int, eg storing it in a python variable to manipulate it, multiply it by another int etc. 我有下面的数据框,我试图获取值3097作为一个整数,例如,将其存储在python变量中以对其进行操作,然后将其乘以另一个int等。

在此处输入图片说明

I've managed to get the row, but I don't even now if it's a good way to do it and I still can't have the value as a int. 我已经成功地获得了成功,但是即使这是一个很好的方法,但我现在还是不知道,我仍然无法拥有作为整数的价值。

data.groupBy("card_bank", "failed").count().filter(data["failed"] == "true").collect()

在此处输入图片说明

您需要从序列中获取row (用于循环或映射功能),然后根据https://spark.apache.org/docs/1.4.0/api/java/org/apache row.getInt(2) /spark/sql/Row.html

Try selecting the value from spark dataframe :- 尝试从spark数据框选择值:-

df =data.groupBy("card_bank", "failed").count().filter(data["failed"] == "true").collect()
value = df.select("count").as[int].collect()

here, value will be a list. 在这里,值将是一个列表。

Get the first record from the Row object using index 0 and get the value using the index "count" 使用索引0从Row对象获取第一条记录,并使用索引“ count”获取值

from pyspark.sql.functions import col
data.groupby("card_bank", "failed").count().filter(col("failed") == "true").collect()[0]["count"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM