[英]How to generate date values into a dataframe in PySpark?
我可以知道下面的代碼有什么問題嗎? 它不打印任何東西。
from pyspark.sql import SparkSession
from pyspark.sql.functions import current_date, current_timestamp,last_day,next_day, date_format, date_add, year, month, dayofmonth, dayofyear, dayofweek, date_trunc, date_sub, to_date, add_months, weekofyear, quarter, col
from pyspark.sql.types import StructType,StructField,StringType, IntegerType
ss = SparkSession.builder.appName('DateDim').master('local[1]').getOrCreate()
df = ss.createDataFrame([],StructType([]))
current_date()
df = df.select(current_date().alias("current_date"),next_day(current_date(), 'sunday').alias("next_day"),dayofweek(current_date()).alias("day_of_week"),dayofmonth(current_date()).alias("day_of_month"),dayofyear(current_date()).alias("day_of_year"),last_day(current_date()).alias("last_day"),year(current_date()).alias("year"),month(current_date()).alias("month"), weekofyear(current_date()).alias("week_of_year"),quarter(current_date()).alias("quarter")).collect()
print(df)
for i in range(1, 1000):
print(i)
for i in range(1, 1000):
v_date = date_add(v_date, i)
df.unionAll(df.select(v_date.alias("current_date"),next_day(v_date,'sunday').alias("next_day"),dayofweek(v_date).alias("day_of_week"),dayofmonth(v_date).alias("day_of_month"),dayofyear(v_date).alias("day_of_year"),last_day(v_date).alias("last_day"),year(v_date).alias("year"),month(v_date).alias("month"), weekofyear(v_date).alias("week_of_year"),quarter(v_date).alias("quarter")))
df.show()
您得到的行數為零,因為初始df
中沒有行。 任何正在創建的列都沒有值,因為df
中沒有行。
您似乎正在嘗試創建一個 dataframe ,從當天開始有 1000 個日期。 有一種使用sequence
function 的簡單方法。
data_sdf = spark.createDataFrame([(1,)], 'id string')
data_sdf. \
withColumn('min_dt', func.current_date().cast('date')). \
withColumn('max_dt', func.date_add('min_dt', 1000).cast('date')). \
withColumn('all_dates', func.expr('sequence(min_dt, max_dt, interval 1 day)')). \
withColumn('dates_exp', func.explode('all_dates')). \
drop('id'). \
show(10)
# +----------+----------+--------------------+----------+
# | min_dt| max_dt| all_dates| dates_exp|
# +----------+----------+--------------------+----------+
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-27|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-28|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-29|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-30|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-31|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-01|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-02|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-03|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-04|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-05|
# +----------+----------+--------------------+----------+
# only showing top 10 rows
select
dates_exp
字段供進一步使用。
您想使用range()
explode
生成行(使用sequence
您將生成一個數組,然后您需要將其分解為行)。
這就是你可以使用它的方式:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, next_day, dayofweek, dayofmonth, dayofyear, last_day, year, month, \
weekofyear, quarter, current_date
spark = SparkSession.builder.getOrCreate()
(
spark
.range(0, 1000)
.alias("id")
.select(
(current_date() + col('id').cast("int")).alias("date")
)
.select(
"date",
next_day("date", 'sunday').alias("next_sunday"),
dayofweek("date").alias("day_of_week"),
dayofmonth("date").alias("day_of_month"),
dayofyear("date").alias("day_of_year"),
last_day("date").alias("last_day"),
year("date").alias("year"),
month("date").alias("month"),
weekofyear("date").alias("week_of_year"),
quarter("date").alias("quarter")
)
).show()
它返回
+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
| date|next_sunday|day_of_week|day_of_month|day_of_year| last_day|year|month|week_of_year|quarter|
+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
|2022-09-22| 2022-09-25| 5| 22| 265|2022-09-30|2022| 9| 38| 3|
|2022-09-23| 2022-09-25| 6| 23| 266|2022-09-30|2022| 9| 38| 3|
|2022-09-24| 2022-09-25| 7| 24| 267|2022-09-30|2022| 9| 38| 3|
|2022-09-25| 2022-10-02| 1| 25| 268|2022-09-30|2022| 9| 38| 3|
|2022-09-26| 2022-10-02| 2| 26| 269|2022-09-30|2022| 9| 39| 3|
|2022-09-27| 2022-10-02| 3| 27| 270|2022-09-30|2022| 9| 39| 3|
|2022-09-28| 2022-10-02| 4| 28| 271|2022-09-30|2022| 9| 39| 3|
|2022-09-29| 2022-10-02| 5| 29| 272|2022-09-30|2022| 9| 39| 3|
|2022-09-30| 2022-10-02| 6| 30| 273|2022-09-30|2022| 9| 39| 3|
|2022-10-01| 2022-10-02| 7| 1| 274|2022-10-31|2022| 10| 39| 4|
|2022-10-02| 2022-10-09| 1| 2| 275|2022-10-31|2022| 10| 39| 4|
|2022-10-03| 2022-10-09| 2| 3| 276|2022-10-31|2022| 10| 40| 4|
|2022-10-04| 2022-10-09| 3| 4| 277|2022-10-31|2022| 10| 40| 4|
|2022-10-05| 2022-10-09| 4| 5| 278|2022-10-31|2022| 10| 40| 4|
|2022-10-06| 2022-10-09| 5| 6| 279|2022-10-31|2022| 10| 40| 4|
|2022-10-07| 2022-10-09| 6| 7| 280|2022-10-31|2022| 10| 40| 4|
|2022-10-08| 2022-10-09| 7| 8| 281|2022-10-31|2022| 10| 40| 4|
|2022-10-09| 2022-10-16| 1| 9| 282|2022-10-31|2022| 10| 40| 4|
|2022-10-10| 2022-10-16| 2| 10| 283|2022-10-31|2022| 10| 41| 4|
|2022-10-11| 2022-10-16| 3| 11| 284|2022-10-31|2022| 10| 41| 4|
+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
only showing top 20 rows
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.