[英]Substring in python resulting column object being not callable
I'm working on Pyspark, created a sample dataframe with some long and decimal type columns.我正在研究 Pyspark,创建了一个示例 dataframe,其中包含一些长型和小数类型的列。 Here I wanted to fetch decimal type column value to two decimal points without rounding.
在这里,我想将 decimal 类型的列值取到小数点后两位而不四舍五入。 Below is the code I tried.
下面是我试过的代码。
df = spark.createDataFrame([
(324.456, "hi", "test"),
(453.987, "hello", "python"),
(768.66, "test", "java")
], ["col1", "col2", "col3"]
)
new = df.withColumn(
"col4",
F.substring((df.col1).cast(StringType()),1,F.instr((df.col1).cast(StringType()),".")+2))
So here I'm converting the column into string and finding he index position adding two (because I need two decimal points without rounding).所以在这里我将列转换为字符串并找到他的索引 position 添加两个(因为我需要两个小数点而不四舍五入)。 But I don't what's the mistake here I'm getting
Column
object is not callable error .但我不知道这是什么错误,我得到
Column
object is not callable error 。 If I'm using only F.instr() function it is working fine.如果我只使用 F.instr() function 它工作正常。 Kindly help with my other solution two fetch the value to two decimals without rounding.
请帮助我的另一个解决方案,即在不四舍五入的情况下将值提取到两位小数。
Expected output
col1 col2 col3 col4
324.456 hi test 324.45
453.987 hello python 453.98
768.66 test java 768.66
You can also use a regular expression with regexp_extract
here:您还可以在此处使用带
regexp_extract
的正则表达式:
df.withColumn('test',
F.regexp_extract(F.col("col1").cast("string"),'\d+[.]\d{2}',0)).show()
Or as @MohammadMurtazaHashmi sugested in comments no casting required:或者正如@MohammadMurtazaHashmi 在评论中所暗示的那样,不需要强制转换:
df.withColumn('test',F.regexp_extract(F.col("col1"),'\d+[.]\d{2}',0)).show()
+-------+-----+------+------+
| col1| col2| col3| test|
+-------+-----+------+------+
|324.456| hi| test|324.45|
|453.987|hello|python|453.98|
| 768.66| test| java|768.66|
+-------+-----+------+------+
What you're looking for is a way of truncating decimals.您正在寻找的是一种截断小数的方法。 I propose you use
pyspark.sql.functions.pow
and some clever use of casting to LongType
for this.我建议您使用
pyspark.sql.functions.pow
并为此巧妙地使用转换为LongType
。 This way, you multiply by 10^{decimal_places}
and divide by the same again, while casting to long
to get rid of decimals (floats) in between, such as:这样,您乘以
10^{decimal_places}
并再次除以相同的值,同时转换为long
以去除两者之间的小数(浮点数),例如:
df2.show()
+-------+-----+------+
| col1| col2| col3|
+-------+-----+------+
|324.456| hi| test|
|453.987|hello|python|
| 768.66| test| java|
+-------+-----+------+
decimal_places = 2
truncated_value_column = f.pow(f.lit(10), decimal_places).cast('long')
df2.withColumn(
"trunc",
((f.col("col1") * truncated_value_column)).cast("long") / truncated_value_column
).show()
+-------+-----+------+------+
| col1| col2| col3| trunc|
+-------+-----+------+------+
|324.456| hi| test|324.45|
|453.987|hello|python|453.98|
| 768.66| test| java|768.66|
+-------+-----+------+------+
NB: If you then wish to cast back to string
, I recommend you do so afterwards.注意:如果您随后希望转换回
string
,我建议您之后再这样做。 Hope this helps!希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.