繁体   English   中英

在每个开始日期和结束日期使用data_range扩展数据框

[英]Extend dataframe using data_range for every start, end date

考虑包含开始日期和结束日期的雇主-雇员链接的数据。

   employer  employee      start        end
0         0         0 2007-01-01 2007-12-31
1         1        86 2007-01-01 2007-12-31
2         1        63 2007-06-01 2007-12-31
3         1        93 2007-01-01 2007-12-31

现在,我要“传播”日期,即在startend之间每个月创建一个观察值。 我认为

def extend(x):
    index = pd.date_range(start=x['start'], end=x['end'], freq='M')
    df = pd.DataFrame([x.values], index=index, columns=x.index)
    return df

long = df.apply(extend, axis=1)

可以解决问题,但是它只包含索引:

>>> long.head()
Out[245]: 
   employer  employee  start  end
0  employer  employee  start  end
1  employer  employee  start  end

但是,当我在第一行中对此进行测试时,此方法有效:

>>> extend(df.iloc[0])
Out[246]: 
            employer  employee      start        end
2007-01-31         0         0 2007-01-01 2007-12-31
2007-02-28         0         0 2007-01-01 2007-12-31
2007-03-31         0         0 2007-01-01 2007-12-31
(...)

我究竟做错了什么? 也许,还有更好的方法吗? 我的最终目标是获得与上一个相同的输出,但格式设置为employer employee month year

我认为问题是apply期望返回与输入相同的行数。

您可以通过iterrows和列表理解做到这iterrows ,而无需大量修改代码:

def extend(x):
    index = pd.date_range(start=x['start'], end=x['end'], freq='M')
    df = pd.DataFrame([x.values], index=index, columns=x.index)
    return df

>>> new = pd.concat([extend(x) for _,x in df.iterrows()])
>>> new

            employer  employee      start        end
2007-01-31         0         0 2007-01-01 2007-12-31
2007-02-28         0         0 2007-01-01 2007-12-31
2007-03-31         0         0 2007-01-01 2007-12-31
2007-04-30         0         0 2007-01-01 2007-12-31
2007-05-31         0         0 2007-01-01 2007-12-31
2007-06-30         0         0 2007-01-01 2007-12-31
2007-07-31         0         0 2007-01-01 2007-12-31
2007-08-31         0         0 2007-01-01 2007-12-31
2007-09-30         0         0 2007-01-01 2007-12-31
2007-10-31         0         0 2007-01-01 2007-12-31
2007-11-30         0         0 2007-01-01 2007-12-31
2007-12-31         0         0 2007-01-01 2007-12-31
2007-01-31         1        86 2007-01-01 2007-12-31
2007-02-28         1        86 2007-01-01 2007-12-31
2007-03-31         1        86 2007-01-01 2007-12-31
2007-04-30         1        86 2007-01-01 2007-12-31
2007-05-31         1        86 2007-01-01 2007-12-31
2007-06-30         1        86 2007-01-01 2007-12-31
2007-07-31         1        86 2007-01-01 2007-12-31
2007-08-31         1        86 2007-01-01 2007-12-31
2007-09-30         1        86 2007-01-01 2007-12-31
2007-10-31         1        86 2007-01-01 2007-12-31
2007-11-30         1        86 2007-01-01 2007-12-31
2007-12-31         1        86 2007-01-01 2007-12-31
2007-06-30         1        63 2007-06-01 2007-12-31
2007-07-31         1        63 2007-06-01 2007-12-31
2007-08-31         1        63 2007-06-01 2007-12-31
2007-09-30         1        63 2007-06-01 2007-12-31
2007-10-31         1        63 2007-06-01 2007-12-31
2007-11-30         1        63 2007-06-01 2007-12-31
2007-12-31         1        63 2007-06-01 2007-12-31
2007-01-31         1        93 2007-01-01 2007-12-31
2007-02-28         1        93 2007-01-01 2007-12-31
2007-03-31         1        93 2007-01-01 2007-12-31
2007-04-30         1        93 2007-01-01 2007-12-31
2007-05-31         1        93 2007-01-01 2007-12-31
2007-06-30         1        93 2007-01-01 2007-12-31
2007-07-31         1        93 2007-01-01 2007-12-31
2007-08-31         1        93 2007-01-01 2007-12-31
2007-09-30         1        93 2007-01-01 2007-12-31
2007-10-31         1        93 2007-01-01 2007-12-31
2007-11-30         1        93 2007-01-01 2007-12-31
2007-12-31         1        93 2007-01-01 2007-12-31

您也可以使用groupby/apply执行此操作,因为它更灵活。 因此,如下所示:

def extend(x):
    x = x.iloc[0,:]
    dates = pd.date_range(start=x['start'], end=x['end'], freq='M')
    return pd.DataFrame(dates,columns=['date'])

>>> long = df.groupby(['employer','employee'])[['start','end']].apply(extend)
>>> long

                           date
employer employee
0        0        0  2007-01-31
                  1  2007-02-28
                  2  2007-03-31
                  3  2007-04-30
                  4  2007-05-31
                  5  2007-06-30
                  6  2007-07-31
                  7  2007-08-31
                  8  2007-09-30
                  9  2007-10-31
                  10 2007-11-30
                  11 2007-12-31
1        63       0  2007-06-30
                  1  2007-07-31
                  2  2007-08-31
                  3  2007-09-30
                  4  2007-10-31
                  5  2007-11-30
                  6  2007-12-31
         86       0  2007-01-31
                  1  2007-02-28
                  2  2007-03-31
                  3  2007-04-30
                  4  2007-05-31
                  5  2007-06-30
                  6  2007-07-31
                  7  2007-08-31
                  8  2007-09-30
                  9  2007-10-31
                  10 2007-11-30
                  11 2007-12-31
         93       0  2007-01-31
                  1  2007-02-28
                  2  2007-03-31
                  3  2007-04-30
                  4  2007-05-31
                  5  2007-06-30
                  6  2007-07-31
                  7  2007-08-31
                  8  2007-09-30
                  9  2007-10-31
                  10 2007-11-30
                  11 2007-12-31

或者可以遍历concat的行

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM