简体   繁体   English

如何创建带有时间戳的 Spark 数据帧?

[英]How to create a Spark dataframe with timestamp?

How can I create this Spark dataframe with timestamp data type in one step using python?如何使用 python 一步创建具有时间戳数据类型的 Spark 数据帧? Here is how I do it in two steps.这是我分两步完成的方法。 Using spark 3.1.2使用火花 3.1.2

from pyspark.sql.functions import *
from pyspark.sql.types import *

schema_sdf = StructType([ 
    StructField("ts", TimestampType(), True),
    StructField("myColumn", LongType(), True),
    ])

sdf = spark.createDataFrame( ( [ ( to_timestamp(lit("2022-06-29 12:01:19.000")), 0 ) ] ), schema=schema_sdf )

PySpark does not automatically interpret timestamp values from strings. PySpark 不会自动解释字符串中的时间戳值。 I mostly use the following syntax to create the df and then to cast column type to timestamp:我主要使用以下语法来创建 df,然后cast列类型转换为时间戳:

from pyspark.sql import functions as F

sdf = spark.createDataFrame([("2022-06-29 12:01:19.000", 0 )], ["ts", "myColumn"])
sdf = sdf.withColumn("ts", F.col("ts").cast("timestamp"))

sdf.printSchema()
# root
#  |-- ts: timestamp (nullable = true)
#  |-- myColumn: long (nullable = true)

Long format was automatically inferred, but for timestamp we needed a cast .长格式是自动推断出来的,但是对于时间戳,我们需要一个cast

On the other hand, even without casting, you are able to use functions which need timestamp as input:另一方面,即使没有强制转换,您也可以使用需要时间戳作为输入的函数:

sdf = spark.createDataFrame([("2022-06-29 12:01:19.000", 0 )], ["ts", "myColumn"])
sdf.printSchema()
# root
#  |-- ts: string (nullable = true)
#  |-- myColumn: long (nullable = true)

sdf.selectExpr("extract(year from ts)").show()
# +---------------------+
# |extract(year FROM ts)|
# +---------------------+
# |                 2022|
# +---------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Spark dataframe 中创建没有毫秒部分的时间戳列 - Create a timestamp column in Spark dataframe without milliseconds part 在 Spark 数据帧中将时间戳转换为日期 - Convert timestamp to date in Spark dataframe 如何根据时间戳范围和行类型对 SPARK dataframe 中的行进行配对 - How to pair rows in SPARK dataframe based on timestamp range and row type Apache Spark:如何从DataFrame创建矩阵? - Apache Spark: How to create a matrix from a DataFrame? 如何在Spark SQL中从列表创建数据框? - How to create dataframe from list in Spark SQL? 如何在 Spark 中使用用户定义模式创建 DataFrame - How to create an DataFrame with a userdefine schema in Spark 给定一个Spark 2.0.0示例,如何在Spark 1.6.2中创建一个空的数据框? - How to create an empty dataframe in Spark 1.6.2 given an example of Spark 2.0.0? 如何创建新的字符串列以从Spark中的时间戳提取整数? - How to create new string column extracting integers from a timestamp in Spark? Python Spark Dataframe:将字符串列转换为时间戳 - Python Spark Dataframe: Conversion of string column into timestamp 创建生成器时如何避免将索引更改为熊猫数据框中的时间戳 - how avoid change index to timestamp in pandas dataframe when create a generator
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM