Substring 在 python 结果列 object 不可调用

Question

I'm working on Pyspark, created a sample dataframe with some long and decimal type columns.我正在研究 Pyspark，创建了一个示例 dataframe，其中包含一些长型和小数类型的列。 Here I wanted to fetch decimal type column value to two decimal points without rounding.在这里，我想将 decimal 类型的列值取到小数点后两位而不四舍五入。 Below is the code I tried.下面是我试过的代码。

df = spark.createDataFrame([
  (324.456, "hi", "test"),
  (453.987, "hello", "python"),
  (768.66, "test", "java")
  ], ["col1", "col2", "col3"]
)
new = df.withColumn(
 "col4",
 F.substring((df.col1).cast(StringType()),1,F.instr((df.col1).cast(StringType()),".")+2))

So here I'm converting the column into string and finding he index position adding two (because I need two decimal points without rounding).所以在这里我将列转换为字符串并找到他的索引 position 添加两个（因为我需要两个小数点而不四舍五入）。 But I don't what's the mistake here I'm getting Column object is not callable error .但我不知道这是什么错误，我得到Column object is not callable error 。 If I'm using only F.instr() function it is working fine.如果我只使用 F.instr() function 它工作正常。 Kindly help with my other solution two fetch the value to two decimals without rounding.请帮助我的另一个解决方案，即在不四舍五入的情况下将值提取到两位小数。

Expected output
col1     col2   col3   col4
324.456  hi     test   324.45
453.987  hello  python 453.98
768.66   test   java   768.66

Answer 1

You can also use a regular expression with regexp_extract here:您还可以在此处使用带regexp_extract的正则表达式：

df.withColumn('test',
              F.regexp_extract(F.col("col1").cast("string"),'\d+[.]\d{2}',0)).show()

Or as @MohammadMurtazaHashmi sugested in comments no casting required:或者正如@MohammadMurtazaHashmi 在评论中所暗示的那样，不需要强制转换：

df.withColumn('test',F.regexp_extract(F.col("col1"),'\d+[.]\d{2}',0)).show()

+-------+-----+------+------+
|   col1| col2|  col3|  test|
+-------+-----+------+------+
|324.456|   hi|  test|324.45|
|453.987|hello|python|453.98|
| 768.66| test|  java|768.66|
+-------+-----+------+------+

Answer 2

What you're looking for is a way of truncating decimals.您正在寻找的是一种截断小数的方法。 I propose you use pyspark.sql.functions.pow and some clever use of casting to LongType for this.我建议您使用pyspark.sql.functions.pow并为此巧妙地使用转换为LongType 。 This way, you multiply by 10^{decimal_places} and divide by the same again, while casting to long to get rid of decimals (floats) in between, such as:这样，您乘以10^{decimal_places}并再次除以相同的值，同时转换为long以去除两者之间的小数（浮点数），例如：

df2.show()
+-------+-----+------+
|   col1| col2|  col3|
+-------+-----+------+
|324.456|   hi|  test|
|453.987|hello|python|
| 768.66| test|  java|
+-------+-----+------+


decimal_places = 2
truncated_value_column = f.pow(f.lit(10), decimal_places).cast('long')

df2.withColumn(
    "trunc", 
    ((f.col("col1") * truncated_value_column)).cast("long") / truncated_value_column
).show()
+-------+-----+------+------+
|   col1| col2|  col3| trunc|
+-------+-----+------+------+
|324.456|   hi|  test|324.45|
|453.987|hello|python|453.98|
| 768.66| test|  java|768.66|
+-------+-----+------+------+

NB: If you then wish to cast back to string , I recommend you do so afterwards.注意：如果您随后希望转换回string ，我建议您之后再这样做。 Hope this helps!希望这可以帮助！

Substring 在 python 结果列 object 不可调用

问题描述

2 个解决方案

解决方案1
2 2020-05-16 12:59:20

解决方案2
1 2020-05-16 10:32:00

Substring 在 python 结果列 object 不可调用

问题描述

2 个解决方案

解决方案1 2 2020-05-16 12:59:20

解决方案2 1 2020-05-16 10:32:00

解决方案1
2 2020-05-16 12:59:20

解决方案2
1 2020-05-16 10:32:00