简体   繁体   English

如何在不使用数据帧的情况下将两个 PySpark 列与 Pytest 进行比较?

[英]How to compare two PySpark columns with Pytest, without using dataframes?

A have a situation where I need to compare the columns before I create the dataframe for a test suit, something like that:有一种情况,我需要在为测试套装创建 dataframe 之前比较列,如下所示:

import pytest
import pyspark.sql.functions as F

def first_test():
    c1 = F.col("First Column").alias("1st Column")
    c2 = F.col("Second Column").alias("2nd Column")
    c3 = F.col("Second Column").alias("2nd Column")

    print(c1)
    print(c2)
    print(c3)

    assert c1 != c2
    assert c2 == c3

Once I run pytest with -s and -vv options I see the following:使用 -s 和 -vv 选项运行 pytest 后,我会看到以下内容:

Column<'`First Column` AS `1st Column'>
Column<'`Second Column` AS `2nd Column`'>
Column<'`Second Column` AS `2nd Column`'>

self = Column<'(`First Column` AS `1st Column` = `Second Column` AS `2nd Column`)'>

    def __nonzero__(self) -> None:
        raise ValueError(
>           "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
            "'~' for 'not' when building DataFrame boolean expressions."
        )
E       ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

The same error if I comment the first assertion (c1==c2) and keep only the second (c2==c3).如果我评论第一个断言 (c1==c2) 并只保留第二个断言 (c2==c3),则会出现同样的错误。

How can I assert that 2 columns are the same in this simple case scenario?在这个简单的案例场景中,我如何断言 2 列是相同的?

Since Python is an incomplete language it's not possible to do it properly, the best solution I found was to use str() to convert the column as string and compare both strings, I also used to compare the data type.由于 Python 是一种不完整的语言,因此无法正确执行,因此我发现的最佳解决方案是使用 str() 将列转换为字符串并比较两个字符串,我还用于比较数据类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM