[英]How to compare two PySpark columns with Pytest, without using dataframes?
A have a situation where I need to compare the columns before I create the dataframe for a test suit, something like that:有一种情况,我需要在为测试套装创建 dataframe 之前比较列,如下所示:
import pytest
import pyspark.sql.functions as F
def first_test():
c1 = F.col("First Column").alias("1st Column")
c2 = F.col("Second Column").alias("2nd Column")
c3 = F.col("Second Column").alias("2nd Column")
print(c1)
print(c2)
print(c3)
assert c1 != c2
assert c2 == c3
Once I run pytest with -s and -vv options I see the following:使用 -s 和 -vv 选项运行 pytest 后,我会看到以下内容:
Column<'`First Column` AS `1st Column'>
Column<'`Second Column` AS `2nd Column`'>
Column<'`Second Column` AS `2nd Column`'>
self = Column<'(`First Column` AS `1st Column` = `Second Column` AS `2nd Column`)'>
def __nonzero__(self) -> None:
raise ValueError(
> "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
"'~' for 'not' when building DataFrame boolean expressions."
)
E ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
The same error if I comment the first assertion (c1==c2) and keep only the second (c2==c3).如果我评论第一个断言 (c1==c2) 并只保留第二个断言 (c2==c3),则会出现同样的错误。
How can I assert that 2 columns are the same in this simple case scenario?在这个简单的案例场景中,我如何断言 2 列是相同的?
Since Python is an incomplete language it's not possible to do it properly, the best solution I found was to use str() to convert the column as string and compare both strings, I also used to compare the data type.由于 Python 是一种不完整的语言,因此无法正确执行,因此我发现的最佳解决方案是使用 str() 将列转换为字符串并比较两个字符串,我还用于比较数据类型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.