简体   繁体   English

如何将 pyspark 数据帧中的时间戳列值减少 1 毫秒

[英]how to reduce timestamp column value in pyspark data-frame by 1 ms

I have pyspark data-frame which has timestamp column, I want to reduce timestamp by 1 ms.我有具有时间戳列的 pyspark 数据帧,我想将时间戳减少 1 毫秒。 Is there some in-built function available in spark for handling such scenario? spark中是否有一些内置的function可用于处理这种情况?

for example value for timestamp column: timestamp value: 2020-07-13 17:29:36例如时间戳列的值:时间戳值:2020-07-13 17:29:36

By using double type, you can do this.通过使用双精度类型,您可以做到这一点。

import pyspark.sql.functions as f

df = spark.createDataFrame([(1, '2020-07-13 17:29:36')], ['id', 'time'])

df.withColumn('time', f.to_timestamp('time', 'yyyy-MM-dd HH:mm:ss')) \
  .withColumn('timediff', (f.col('time').cast('double') - f.lit(0.001)).cast('timestamp')) \
  .show(10, False)

+---+-------------------+-----------------------+
|id |time               |timediff               |
+---+-------------------+-----------------------+
|1  |2020-07-13 17:29:36|2020-07-13 17:29:35.999|
+---+-------------------+-----------------------+

You can use pyspark.sql.functions.expr to subtract INTERVAL 1 milliseconds您可以使用pyspark.sql.functions.expr减去INTERVAL 1 milliseconds

from pyspark.sql.functions import expr

df = spark.createDataFrame([('2020-07-13 17:29:36',)], ['time'])
df = df.withColumn('time2', expr("time - INTERVAL 1 milliseconds"))
df.show(truncate=False)
#+-------------------+-----------------------+
#|time               |time2                  |
#+-------------------+-----------------------+
#|2020-07-13 17:29:36|2020-07-13 17:29:35.999|
#+-------------------+-----------------------+

Even if time is a string of this format, Spark will make an implicit conversion for you.即使time是这种格式的字符串,Spark 也会为你进行隐式转换。

df.printSchema()
#root
# |-- time: string (nullable = true)
# |-- time2: string (nullable = true)

You could also use INTERVAL with expr .您也可以将INTERVALexpr一起使用。

import pyspark.sql.functions as F

df = spark.createDataFrame(
    [
        (1, '2020-07-13 17:29:36')
    ], 
    [
        'id', 'time'
    ]
)

df.withColumn(
    'time', 
    F.col('time').cast('timestamp')
).withColumn(
    'timediff', 
    (
        F.col('time') -  F.expr('INTERVAL 1 milliseconds')
    ).cast('timestamp') 
).show(truncate=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何用 lambda function 为 pyspark 数据帧编写 reduce? - How to write reduce with lambda function for pyspark data-frame? 如何将变量值分配为 pyspark 数据框中的新列值? - How to assign variable value as new column value in pyspark data-frame? 如何遍历 pyspark 中未知数据帧的列的行 - How to iterate through rows of a column of a unknown data-frame in pyspark 如何将一个数据帧的连接值插入Pyspark中的另一个数据帧? - How to insert concatenated values from a data-frame into another data-frame in Pyspark? 根据数据框中另一列的值创建新列 - Create new column based on a value of another column in a data-frame 如果未找到数据框列,则替换列值 - Replace a column value if it not found data-frame column 如何检查一列中的值是否等于另一列数据框中的值 - How to check if value from one column is equal to value in another columns data-frame 在Pandas Data-frame中使用掩码的字符串值创建一个新列 - Create a new column in Pandas Data-frame with a string value of a mask 当您知道列和行引用时如何更改数据框中的字段值 - How to change a field value in a data-frame when you know the column and a row reference Azure Databricks:如何在数据帧中过滤带有 not like 运算符的列? - Azure Databricks : How to filter a column with not like operator in data-frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM