I want to cast the column type to int and get the first 3 rows
df.withColumn("rn", rowNumber().over(windowSpec).cast('int')).where("rn"<=3).drop("rn").show()
but I this error
TypeError: unorderable types: str() <= int()
The error is here:
.where("rn"<=3)
And here's how you can figure that out if you ever encounter a similar problem in the future. Following
TypeError: unorderable types: str() <= int()
is a Python exception and there is no Py4JError
. This typically means you can dismiss JVM issues and focus on core Python. The only part of your code where you explicitly compare things is:
"rn" <= 3
If you want it to be a SQL literal you should pass a string:
.where("rn <= 3")
If you want rn
to be resolved as a column use col
function:
from pyspark.sql.functions import col
.where(col("rn") <= 3)
Also rowNumber
function has been removed in the latest release. You should use row_number
for a forward compatibility.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.