[英]Convert week number in dataframe to start date of week (Monday)
I'm looking to convert daily data into weekly data.我希望将每日数据转换为每周数据。 Here is the code I've used to achieve this
这是我用来实现这一目标的代码
daily_data['Week_Number'] = pd.to_datetime(daily_data['candle_date']).dt.week
daily_data['Year'] = pd.to_datetime(daily_data['candle_date']).dt.year
df2 = daily_data.groupby(['Year', 'Week_Number']).agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum', 'market_cap': 'sum'})
Currently, the dataframe output looks as below -目前,dataframe output 看起来如下 -
open high low close volume market_cap
Year Week_Number
2020 31 11106.793367 12041.230145 10914.007709 11059.660924 86939673211 836299315108
32 11059.658520 11903.881608 11011.841384 11653.660942 125051146775 1483987715241
33 11665.874956 12047.515879 11199.052457 11906.236593 141819289223 1513036354035
34 11915.898402 12382.422676 11435.685834 11671.520767 136888268138 1533135548697
35 11668.211439 11806.669046 11183.114210 11704.963980 122232543594 1490089199926
36 11713.540300 12044.196936 9951.201578 10277.329333 161912442921 1434502733759
I'd like the output to have a column week_date
that shows the date of Monday of the week as the start date.我希望 output 有一个
week_date
列,它显示一周中的星期一的日期作为开始日期。 Ex: Show 27-07-2020 in place of 31st week of 2020 and so on.例如:显示 27-07-2020 代替 2020 年的第 31 周,依此类推。 It's this final piece that I'm stuck with really badly.
这是最后一块,我真的很糟糕。 Please could I request some help to achieve this.
请我请求一些帮助来实现这一目标。
** **
** **
The entire function used to convert daily data to weekly below整个 function 用于将每日数据转换为每周以下数据
def convert_dailydata_to_weeklydata(daily_data):
# Print function name
SupportMethods.print_func_name()
# Loop over the rows until a row with Monday as date is present
row_counter_start = 0
while True:
if datetime.weekday(daily_data['candle_date'][row_counter_start]) == 0:
break
row_counter_start += 1
# # Loop over the rows until a row with Sunday as date is present
# row_counter_end = len(daily_data.index) - 1
# while True:
# if datetime.weekday(daily_data['candle_date'][row_counter_end]) == 6:
# break
# row_counter_end -= 1
# print(daily_data)
# print(row_counter_end)
# Copy all rows after the first Monday row of data is reached
daily_data_temp = daily_data[row_counter_start:]
# Getting week number
daily_data_temp['Week_Number'] = pd.to_datetime(daily_data_temp['candle_date']).dt.week
# Getting year. Weeknum is common across years to we need to create unique index by using year and weeknum
daily_data_temp['Year'] = pd.to_datetime(daily_data_temp['candle_date']).dt.year
# Grouping based on required values
df = daily_data_temp.groupby(['Year', 'Week_Number']).agg(
{'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum', 'market_cap': 'sum'})
# Reset index
df = df.reset_index()
# Create week date (start of week)
# The + "1" isfor the day of the week.Week numbers 0-6 with 0 being Sunday and 6 being Saturday.
df['week_date'] = pd.to_datetime(df['Year'].astype(str) + df['Week_Number'].astype(str) + "1", format='%G%V%w')
# Set indexes
df = df.set_index(['Year', 'Week_Number'])
# Re-order columns into a new dataframe
weekly_data = df[["week_date", "open", "high", "low", "close", "volume", "market_cap"]]
weekly_data = weekly_data.rename({'week_date': 'candle_date'}, axis=1)
# Drop index columns
weekly_data.reset_index(drop=True, inplace=True)
# Return data by dropping curent week's data
if datetime.weekday(weekly_data.head(-1)['candle_date']) != 0:
return weekly_data.head(-1)
else:
return weekly_data
try via apply()
and datetime.strptime()
method:通过
apply()
和datetime.strptime()
方法尝试:
import datetime
df = df.reset_index()
df['week_date']=(df[['Year','Week_Number']].astype(str)
.apply(lambda x:datetime.datetime.strptime('-W'.join(x) + '-1', "%Y-W%W-%w"),1))
df = df.set_index(['Year', 'Week_Number'])
Try using pd.to_datetime
on the 'Year' and 'Week_Number' columns with a format string for Year, Week of Year, and Day of Week ( '%G%V%w'
):尝试在 'Year' 和 'Week_Number' 列上使用
pd.to_datetime
,并使用格式字符串表示年、年中的周和周中的日期 ( '%G%V%w'
):
df = df.reset_index()
df['week_date'] = pd.to_datetime(
df['Year'].astype(str) + df['Week_Number'].astype(str) + "1",
format='%G%V%w'
)
df = df.set_index(['Year', 'Week_Number'])
The + "1"
is for the day of the week. + "1"
代表星期几。 Week numbers 0-6 with 0 being Sunday and 6 being Saturday.周数 0-6,其中 0 是星期日,6 是星期六。 ( Ref. Format Codes )
( 参考格式代码)
df
: df
:
open close week_date
Year Week_Number
2020 31 11106.793367 11059.660924 2020-07-27
32 11059.658520 11653.660942 2020-08-03
Try use dt.strftime
with '%V'尝试将
dt.strftime
与 '%V' 一起使用
pd.to_datetime(pd.Series(['27-07-2020'])).dt.strftime('%V')
df['week_date-Week']=pd.to_datetime(df['Week_Number'].astype(str)+df['Year'].astype(str).add('-1') ,format='%V%G-%u')
pyspark SQL pyspark SQL
When data is too heavy, .apply will take a lot of time to process.当数据太重时,.apply 将花费大量时间来处理。 I used below code to get month first date and week date starting from Monday.
我使用下面的代码从星期一开始获取月份的第一个日期和星期日期。
df= df.withColumn('month_date', trunc('date', 'month'))
Output Output
date month_date
2019-05-28 2019-05-01
Below gets us week date from date column keeping week start from monday下面从日期列中获取我们的星期日期,保持星期从星期一开始
df= df.withColumn("week_end", next_day("date", "SUN")).withColumn("week_start_date", date_sub("week_end", 6))
I used these on databricks.我在数据块上使用了这些。 Apply took more than 2 hr on 200 billion rows data and this one took only 5 mins around
Apply 花了 2 多小时处理 2000 亿行数据,而这一次只用了 5 分钟左右
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.