[英]how to reduce timestamp column value in pyspark data-frame by 1 ms
I have pyspark data-frame which has timestamp column, I want to reduce timestamp by 1 ms.我有具有时间戳列的 pyspark 数据帧,我想将时间戳减少 1 毫秒。 Is there some in-built function available in spark for handling such scenario?
spark中是否有一些内置的function可用于处理这种情况?
for example value for timestamp column: timestamp value: 2020-07-13 17:29:36例如时间戳列的值:时间戳值:2020-07-13 17:29:36
By using double type, you can do this.通过使用双精度类型,您可以做到这一点。
import pyspark.sql.functions as f
df = spark.createDataFrame([(1, '2020-07-13 17:29:36')], ['id', 'time'])
df.withColumn('time', f.to_timestamp('time', 'yyyy-MM-dd HH:mm:ss')) \
.withColumn('timediff', (f.col('time').cast('double') - f.lit(0.001)).cast('timestamp')) \
.show(10, False)
+---+-------------------+-----------------------+
|id |time |timediff |
+---+-------------------+-----------------------+
|1 |2020-07-13 17:29:36|2020-07-13 17:29:35.999|
+---+-------------------+-----------------------+
You can use pyspark.sql.functions.expr
to subtract INTERVAL 1 milliseconds
您可以使用
pyspark.sql.functions.expr
减去INTERVAL 1 milliseconds
from pyspark.sql.functions import expr
df = spark.createDataFrame([('2020-07-13 17:29:36',)], ['time'])
df = df.withColumn('time2', expr("time - INTERVAL 1 milliseconds"))
df.show(truncate=False)
#+-------------------+-----------------------+
#|time |time2 |
#+-------------------+-----------------------+
#|2020-07-13 17:29:36|2020-07-13 17:29:35.999|
#+-------------------+-----------------------+
Even if time
is a string of this format, Spark will make an implicit conversion for you.即使
time
是这种格式的字符串,Spark 也会为你进行隐式转换。
df.printSchema()
#root
# |-- time: string (nullable = true)
# |-- time2: string (nullable = true)
You could also use INTERVAL with expr .您也可以将INTERVAL与expr一起使用。
import pyspark.sql.functions as F
df = spark.createDataFrame(
[
(1, '2020-07-13 17:29:36')
],
[
'id', 'time'
]
)
df.withColumn(
'time',
F.col('time').cast('timestamp')
).withColumn(
'timediff',
(
F.col('time') - F.expr('INTERVAL 1 milliseconds')
).cast('timestamp')
).show(truncate=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.