简体   繁体   English

用 pandas dataframe 中的日期扩展值

[英]Expand values with dates in a pandas dataframe

I have a dataframe with name values and a date range (start/end).我有一个带有名称值和日期范围(开始/结束)的 dataframe。 I need to expand/replace the dates with the ones generated by the from/to index.我需要用 from/to 索引生成的日期扩展/替换日期。 How can I do this?我怎样才能做到这一点?

Name                    date_range
NameOne_%Y%m-%d         [-2,1]
NameTwo_%y%m%d          [-3,1]        

Desired result (Assuming that today's date is 2021-03-09 - 9 of march 2021):期望的结果(假设今天的日期是 2021-03-09 - 2021 年 3 月 9 日):

Name 
NameOne_202103-10 
NameOne_202103-09
NameOne_202103-08
NameOne_202103-07
NameTwo_210310
NameTwo_210309
NameTwo_210308
NameTwo_210307
NameTwo_210306

I've been trying iterating over the dataframe and then generating the dates, but I still can't make it work..我一直在尝试迭代 dataframe 然后生成日期,但我仍然无法使其工作..

for index, row in self.config_df.iterrows():
    print(row['source'], row['date_range'])
    days_sub=int(str(self.config_df["date_range"][0]).strip("[").strip("]").split(",")[0].strip())
    days_add=int(str(self.config_df["date_range"][0]).strip("[").strip("]").split(",")[1].strip())
    start_date = date.today() + timedelta(days=days_sub)
    end_date = date.today() + timedelta(days=days_add)
    
    date_range_df=pd.date_range(start=start_date, end=end_date)
    date_range_df["source"]=row['source']

Any help is appreciated.任何帮助表示赞赏。 Thanks!谢谢!

Convert your date_range from str to list with ast module:使用ast模块将您的date_rangestr转换为list

import ast
df = df.assign(date_range=df["date_range"].apply(ast.literal_eval)

Use date_range to create list of dates and explode to chain the list:使用date_range创建日期列表explode展开以链接列表:

today = pd.Timestamp.today().normalize()
offset = pd.tseries.offsets.Day  # shortcut

names = pd.Series([pd.date_range(today + offset(end),
                                 today + offset(start),
                                 freq="-1D").strftime(name)
            for name, (start, end) in df.values]).explode(ignore_index=True)
>>> names
0    NameOne_202103-10
1    NameOne_202103-09
2    NameOne_202103-08
3    NameOne_202103-07
4       NameTwo_210310
5       NameTwo_210309
6       NameTwo_210308
7       NameTwo_210307
8       NameTwo_210306
dtype: object

Alright.好吧。 From your question I understand you have a starting data frame like so:根据您的问题,我了解到您有一个像这样的起始数据框:

config_df = pd.DataFrame({
    'name': ['NameOne_%Y-%m-%d', 'NameTwo_%y%m%d'],
    'str_date_range': ['[-2,1]', '[-3,1]']})

Resulting in this:结果是:

               name str_date_range
0  NameOne_%Y-%m-%d         [-2,1]
1    NameTwo_%y%m%d         [-3,1]

To achieve your goal and avoid iterating rows - which should be avoided using pandas - you can use groupby().apply() like so:为了实现您的目标并避免迭代行 - 应该避免使用 pandas - 您可以像这样使用groupby().apply()

def expand(row):
    # Get the start_date and end_date from the row, by splitting 
    # the string and taking the first and last value respectively.
    # .min() is required because row is technically a pd.Series
    start_date = row.str_date_range.str.strip('[]').str.split(',').str[0].astype(int).min()
    end_date = row.str_date_range.str.strip('[]').str.split(',').str[1].astype(int).min()
    # Create a list range for from start_date to end_date. 
    # Note that range() does not include the end_date, therefor add 1
    day_range = range(start_date, end_date+1)
    # Create a Timedelta series from the day_range
    days_diff = pd.to_timedelta(pd.Series(day_range), unit='days')
    # Create an equally sized Series of today Timestamps
    todays = pd.Series(pd.Timestamp.today()).repeat(len(day_range)-1).reset_index(drop=True)
    df = todays.to_frame(name='date')
    # Add days_diff to date column
    df['date'] = df.date + days_diff
    df['name'] = row.name
    # Extract the date format from the name
    date_format = row.name.split('_')[1]
    # Add a column with the formatted date using the date_format string
    df['date_str'] = df.date.dt.strftime(date_format=date_format)
    df['name'] = df.name.str.split('_').str[0] + '_' + df.date_str
    # Optional: drop columns
    return df.drop(columns=['date'])

config_df.groupby('name').apply(expand).reset_index(drop=True)

returning:返回:

                 name    date_str
0  NameOne_2021-03-07  2021-03-07
1  NameOne_2021-03-08  2021-03-08
2  NameOne_2021-03-09  2021-03-09
3      NameTwo_210306      210306
4      NameTwo_210307      210307
5      NameTwo_210308      210308
6      NameTwo_210309      210309

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM