python spark change dataframe column data type to int error

Question

I want to cast the column type to int and get the first 3 rows

    df.withColumn("rn", rowNumber().over(windowSpec).cast('int')).where("rn"<=3).drop("rn").show()

but I this error

TypeError: unorderable types: str() <= int()

Answer 1

The error is here:

.where("rn"<=3)

And here's how you can figure that out if you ever encounter a similar problem in the future. Following

TypeError: unorderable types: str() <= int()

is a Python exception and there is no Py4JError . This typically means you can dismiss JVM issues and focus on core Python. The only part of your code where you explicitly compare things is:

"rn" <= 3

If you want it to be a SQL literal you should pass a string:

.where("rn <= 3")

If you want rn to be resolved as a column use col function:

from pyspark.sql.functions import col

.where(col("rn") <= 3)

Also rowNumber function has been removed in the latest release. You should use row_number for a forward compatibility.

python spark change dataframe column data type to int error

Question

1 answers

solution1
2 ACCPTED 2016-10-08 11:42:46

python spark change dataframe column data type to int error

Question

1 answers

solution1 2 ACCPTED 2016-10-08 11:42:46

solution1
2 ACCPTED 2016-10-08 11:42:46