May I know what's wrong with below code? it does not print anything.
from pyspark.sql import SparkSession
from pyspark.sql.functions import current_date, current_timestamp,last_day,next_day, date_format, date_add, year, month, dayofmonth, dayofyear, dayofweek, date_trunc, date_sub, to_date, add_months, weekofyear, quarter, col
from pyspark.sql.types import StructType,StructField,StringType, IntegerType
ss = SparkSession.builder.appName('DateDim').master('local[1]').getOrCreate()
df = ss.createDataFrame([],StructType([]))
current_date()
df = df.select(current_date().alias("current_date"),next_day(current_date(), 'sunday').alias("next_day"),dayofweek(current_date()).alias("day_of_week"),dayofmonth(current_date()).alias("day_of_month"),dayofyear(current_date()).alias("day_of_year"),last_day(current_date()).alias("last_day"),year(current_date()).alias("year"),month(current_date()).alias("month"), weekofyear(current_date()).alias("week_of_year"),quarter(current_date()).alias("quarter")).collect()
print(df)
for i in range(1, 1000):
print(i)
for i in range(1, 1000):
v_date = date_add(v_date, i)
df.unionAll(df.select(v_date.alias("current_date"),next_day(v_date,'sunday').alias("next_day"),dayofweek(v_date).alias("day_of_week"),dayofmonth(v_date).alias("day_of_month"),dayofyear(v_date).alias("day_of_year"),last_day(v_date).alias("last_day"),year(v_date).alias("year"),month(v_date).alias("month"), weekofyear(v_date).alias("week_of_year"),quarter(v_date).alias("quarter")))
df.show()
You're getting zero rows because there are no rows in the initial df
. Any columns being created will have no values as there are no rows in df
.
It seems you're trying to create a dataframe with 1000 dates starting from the current day. There's a simple approach using sequence
function.
data_sdf = spark.createDataFrame([(1,)], 'id string')
data_sdf. \
withColumn('min_dt', func.current_date().cast('date')). \
withColumn('max_dt', func.date_add('min_dt', 1000).cast('date')). \
withColumn('all_dates', func.expr('sequence(min_dt, max_dt, interval 1 day)')). \
withColumn('dates_exp', func.explode('all_dates')). \
drop('id'). \
show(10)
# +----------+----------+--------------------+----------+
# | min_dt| max_dt| all_dates| dates_exp|
# +----------+----------+--------------------+----------+
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-27|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-28|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-29|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-30|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-08-31|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-01|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-02|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-03|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-04|
# |2022-08-27|2025-05-23|[2022-08-27, 2022...|2022-09-05|
# +----------+----------+--------------------+----------+
# only showing top 10 rows
select
the dates_exp
field for further use.
You want to use the range()
function which generates rows (using sequence
you will generate an array which you then need to explode
into rows).
That's how you can use it:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, next_day, dayofweek, dayofmonth, dayofyear, last_day, year, month, \
weekofyear, quarter, current_date
spark = SparkSession.builder.getOrCreate()
(
spark
.range(0, 1000)
.alias("id")
.select(
(current_date() + col('id').cast("int")).alias("date")
)
.select(
"date",
next_day("date", 'sunday').alias("next_sunday"),
dayofweek("date").alias("day_of_week"),
dayofmonth("date").alias("day_of_month"),
dayofyear("date").alias("day_of_year"),
last_day("date").alias("last_day"),
year("date").alias("year"),
month("date").alias("month"),
weekofyear("date").alias("week_of_year"),
quarter("date").alias("quarter")
)
).show()
it returns
+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
| date|next_sunday|day_of_week|day_of_month|day_of_year| last_day|year|month|week_of_year|quarter|
+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
|2022-09-22| 2022-09-25| 5| 22| 265|2022-09-30|2022| 9| 38| 3|
|2022-09-23| 2022-09-25| 6| 23| 266|2022-09-30|2022| 9| 38| 3|
|2022-09-24| 2022-09-25| 7| 24| 267|2022-09-30|2022| 9| 38| 3|
|2022-09-25| 2022-10-02| 1| 25| 268|2022-09-30|2022| 9| 38| 3|
|2022-09-26| 2022-10-02| 2| 26| 269|2022-09-30|2022| 9| 39| 3|
|2022-09-27| 2022-10-02| 3| 27| 270|2022-09-30|2022| 9| 39| 3|
|2022-09-28| 2022-10-02| 4| 28| 271|2022-09-30|2022| 9| 39| 3|
|2022-09-29| 2022-10-02| 5| 29| 272|2022-09-30|2022| 9| 39| 3|
|2022-09-30| 2022-10-02| 6| 30| 273|2022-09-30|2022| 9| 39| 3|
|2022-10-01| 2022-10-02| 7| 1| 274|2022-10-31|2022| 10| 39| 4|
|2022-10-02| 2022-10-09| 1| 2| 275|2022-10-31|2022| 10| 39| 4|
|2022-10-03| 2022-10-09| 2| 3| 276|2022-10-31|2022| 10| 40| 4|
|2022-10-04| 2022-10-09| 3| 4| 277|2022-10-31|2022| 10| 40| 4|
|2022-10-05| 2022-10-09| 4| 5| 278|2022-10-31|2022| 10| 40| 4|
|2022-10-06| 2022-10-09| 5| 6| 279|2022-10-31|2022| 10| 40| 4|
|2022-10-07| 2022-10-09| 6| 7| 280|2022-10-31|2022| 10| 40| 4|
|2022-10-08| 2022-10-09| 7| 8| 281|2022-10-31|2022| 10| 40| 4|
|2022-10-09| 2022-10-16| 1| 9| 282|2022-10-31|2022| 10| 40| 4|
|2022-10-10| 2022-10-16| 2| 10| 283|2022-10-31|2022| 10| 41| 4|
|2022-10-11| 2022-10-16| 3| 11| 284|2022-10-31|2022| 10| 41| 4|
+----------+-----------+-----------+------------+-----------+----------+----+-----+------------+-------+
only showing top 20 rows
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.