简体   繁体   English

将 dataframe 中的周数转换为一周的开始日期(星期一)

[英]Convert week number in dataframe to start date of week (Monday)

I'm looking to convert daily data into weekly data.我希望将每日数据转换为每周数据。 Here is the code I've used to achieve this这是我用来实现这一目标的代码

    daily_data['Week_Number'] = pd.to_datetime(daily_data['candle_date']).dt.week
    daily_data['Year'] = pd.to_datetime(daily_data['candle_date']).dt.year
    df2 = daily_data.groupby(['Year', 'Week_Number']).agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum', 'market_cap': 'sum'})

Currently, the dataframe output looks as below -目前,dataframe output 看起来如下 -

                          open          high           low         close        volume     market_cap
Year Week_Number                                                                                     
2020 31           11106.793367  12041.230145  10914.007709  11059.660924   86939673211   836299315108
     32           11059.658520  11903.881608  11011.841384  11653.660942  125051146775  1483987715241
     33           11665.874956  12047.515879  11199.052457  11906.236593  141819289223  1513036354035
     34           11915.898402  12382.422676  11435.685834  11671.520767  136888268138  1533135548697
     35           11668.211439  11806.669046  11183.114210  11704.963980  122232543594  1490089199926
     36           11713.540300  12044.196936   9951.201578  10277.329333  161912442921  1434502733759

I'd like the output to have a column week_date that shows the date of Monday of the week as the start date.我希望 output 有一个week_date列,它显示一周中的星期一的日期作为开始日期。 Ex: Show 27-07-2020 in place of 31st week of 2020 and so on.例如:显示 27-07-2020 代替 2020 年的第 31 周,依此类推。 It's this final piece that I'm stuck with really badly.这是最后一块,我真的很糟糕。 Please could I request some help to achieve this.请我请求一些帮助来实现这一目标。

** **

  • SOLUTION FOR THOSE WHO NEED为需要的人提供解决方案

** **

The entire function used to convert daily data to weekly below整个 function 用于将每日数据转换为每周以下数据

   def convert_dailydata_to_weeklydata(daily_data):
        # Print function name
        SupportMethods.print_func_name()

        # Loop over the rows until a row with Monday as date is present
        row_counter_start = 0
        while True:
            if datetime.weekday(daily_data['candle_date'][row_counter_start]) == 0:
                break
            row_counter_start += 1

        # # Loop over the rows until a row with Sunday as date is present
        # row_counter_end = len(daily_data.index) - 1
        # while True:
        #     if datetime.weekday(daily_data['candle_date'][row_counter_end]) == 6:
        #         break
        #     row_counter_end -= 1
        # print(daily_data)
        # print(row_counter_end)

        # Copy all rows after the first Monday row of data is reached
        daily_data_temp = daily_data[row_counter_start:]

        # Getting week number
        daily_data_temp['Week_Number'] = pd.to_datetime(daily_data_temp['candle_date']).dt.week

        # Getting year. Weeknum is common across years to we need to create unique index by using year and weeknum
        daily_data_temp['Year'] = pd.to_datetime(daily_data_temp['candle_date']).dt.year

        # Grouping based on required values
        df = daily_data_temp.groupby(['Year', 'Week_Number']).agg(
            {'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum', 'market_cap': 'sum'})

        # Reset index
        df = df.reset_index()

        # Create week date (start of week)
        # The + "1" isfor the day of the week.Week numbers 0-6 with 0 being Sunday and 6 being Saturday.
        df['week_date'] = pd.to_datetime(df['Year'].astype(str) + df['Week_Number'].astype(str) + "1", format='%G%V%w')

        # Set indexes
        df = df.set_index(['Year', 'Week_Number'])

        # Re-order columns into a new dataframe
        weekly_data = df[["week_date", "open", "high", "low", "close", "volume", "market_cap"]]
        weekly_data = weekly_data.rename({'week_date': 'candle_date'}, axis=1)

        # Drop index columns
        weekly_data.reset_index(drop=True, inplace=True)

        # Return data by dropping curent week's data
        if datetime.weekday(weekly_data.head(-1)['candle_date']) != 0:
            return weekly_data.head(-1)
        else:
            return weekly_data

try via apply() and datetime.strptime() method:通过apply()datetime.strptime()方法尝试:

import datetime

df = df.reset_index()

df['week_date']=(df[['Year','Week_Number']].astype(str)
                 .apply(lambda x:datetime.datetime.strptime('-W'.join(x) + '-1', "%Y-W%W-%w"),1))

df = df.set_index(['Year', 'Week_Number'])

Try using pd.to_datetime on the 'Year' and 'Week_Number' columns with a format string for Year, Week of Year, and Day of Week ( '%G%V%w' ):尝试在 'Year' 和 'Week_Number' 列上使用pd.to_datetime ,并使用格式字符串表示年、年中的周和周中的日期 ( '%G%V%w' ):

df = df.reset_index()
df['week_date'] = pd.to_datetime(
    df['Year'].astype(str) + df['Week_Number'].astype(str) + "1",
    format='%G%V%w'
)
df = df.set_index(['Year', 'Week_Number'])

The + "1" is for the day of the week. + "1"代表星期几。 Week numbers 0-6 with 0 being Sunday and 6 being Saturday.周数 0-6,其中 0 是星期日,6 是星期六。 ( Ref. Format Codes ) 参考格式代码

df : df

                          open         close  week_date
Year Week_Number                                       
2020 31           11106.793367  11059.660924 2020-07-27
     32           11059.658520  11653.660942 2020-08-03

Try use dt.strftime with '%V'尝试将dt.strftime与 '%V' 一起使用

pd.to_datetime(pd.Series(['27-07-2020'])).dt.strftime('%V')
df['week_date-Week']=pd.to_datetime(df['Week_Number'].astype(str)+df['Year'].astype(str).add('-1') ,format='%V%G-%u')

pyspark SQL pyspark SQL

When data is too heavy, .apply will take a lot of time to process.当数据太重时,.apply 将花费大量时间来处理。 I used below code to get month first date and week date starting from Monday.我使用下面的代码从星期一开始获取月份的第一个日期和星期日期。

df= df.withColumn('month_date', trunc('date', 'month'))

Output Output

date           month_date 
2019-05-28     2019-05-01

Below gets us week date from date column keeping week start from monday下面从日期列中获取我们的星期日期,保持星期从星期一开始

df= df.withColumn("week_end", next_day("date", "SUN")).withColumn("week_start_date", date_sub("week_end", 6))

I used these on databricks.我在数据块上使用了这些。 Apply took more than 2 hr on 200 billion rows data and this one took only 5 mins around Apply 花了 2 多小时处理 2000 亿行数据,而这一次只用了 5 分钟左右

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从当月的周数获取周开始日期(周一或周日) - Get week start date(Monday or Sunday) from week number of the month 考虑到每周的开始日期是python中的星期一,如何获取一年中每周的开始日期和星期数? - How to get week start dates and week number of each week in a year considering start day of the week is Monday in python? 获取周开始日不同于周一的周数 - Python - Get week number with week start day different than monday - Python Pyspark 将年和周数转换为 week_start 日期和 week_end 日期 - Pyspark convert Year and week number to week_start Date & week_end Date 从星期数获取星期开始日期 - Get week start date from week number 使用 Pandas 将周数转换为周(日期) - Convert week number to week (date) with Pandas 将数据帧中的诸如201729之类的Week格式转换为日期为第一个星期一的新列中的月份 - Convert a Week format like 201729 in a dataframe to a month in a new column where the date is the first monday 从 Python (pandas) 的日期列中获取周开始日期(星期一)? - Get week start date (Monday) from a date column in Python (pandas)? 将带有周数的列转换为日期 - Convert a column with a week number to a date Pandas:从包含年和周数的列中获取星期一日期 - Pandas: get the monday date from columns containing year and week number
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM