简体   繁体   中英

python spark change dataframe column data type to int error

I want to cast the column type to int and get the first 3 rows

    df.withColumn("rn", rowNumber().over(windowSpec).cast('int')).where("rn"<=3).drop("rn").show()

but I this error

TypeError: unorderable types: str() <= int()

The error is here:

.where("rn"<=3)

And here's how you can figure that out if you ever encounter a similar problem in the future. Following

TypeError: unorderable types: str() <= int()

is a Python exception and there is no Py4JError . This typically means you can dismiss JVM issues and focus on core Python. The only part of your code where you explicitly compare things is:

"rn" <= 3

If you want it to be a SQL literal you should pass a string:

.where("rn <= 3")

If you want rn to be resolved as a column use col function:

from pyspark.sql.functions import col

.where(col("rn") <= 3)

Also rowNumber function has been removed in the latest release. You should use row_number for a forward compatibility.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM