简体   繁体   English

在 python 的 spark df 中断言特定单元格的值

[英]Assert a value of a specific cell in spark df in python

What is the easiest way of asserting specific cell values in pyspark dataframes?在 pyspark 数据帧中断言特定单元格值的最简单方法是什么?

+---------+--------+
|firstname|lastname|
+---------+--------+
|James    |Smith   |
|Anna     | null   |
|Julia    |Williams|
|Maria    |Jones   |
|Jen      |Brown   |
|Mike     |Williams|
+---------+--------+

I want to assert the existence of values null and "Jen" in their respective rows/columns in this data frame.我想在此数据帧的各自行/列中断言值 null 和“Jen”的存在。

So I can use something like:所以我可以使用类似的东西:

assert df['firstname'][4] == "Jen"
assert df['lastname'][1] == None

From what I found, using collect() is the way (which is equivalent of iloc() in Pandas df):根据我的发现,使用collect()是一种方式(相当于 Pandas df 中的 iloc() ):

assert df.collect()[4]['firstname'] == 'Jen'
assert df.collect()[1]['lastname'] is None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM