A have a situation where I need to compare the columns before I create the dataframe for a test suit, something like that:
import pytest
import pyspark.sql.functions as F
def first_test():
c1 = F.col("First Column").alias("1st Column")
c2 = F.col("Second Column").alias("2nd Column")
c3 = F.col("Second Column").alias("2nd Column")
print(c1)
print(c2)
print(c3)
assert c1 != c2
assert c2 == c3
Once I run pytest with -s and -vv options I see the following:
Column<'`First Column` AS `1st Column'>
Column<'`Second Column` AS `2nd Column`'>
Column<'`Second Column` AS `2nd Column`'>
self = Column<'(`First Column` AS `1st Column` = `Second Column` AS `2nd Column`)'>
def __nonzero__(self) -> None:
raise ValueError(
> "Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
"'~' for 'not' when building DataFrame boolean expressions."
)
E ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
The same error if I comment the first assertion (c1==c2) and keep only the second (c2==c3).
How can I assert that 2 columns are the same in this simple case scenario?
Since Python is an incomplete language it's not possible to do it properly, the best solution I found was to use str() to convert the column as string and compare both strings, I also used to compare the data type.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.