扩展数据框中的每一行

Question

Consider this simple example考虑这个简单的例子

data = pd.DataFrame({'mydate' : [pd.to_datetime('2016-06-06'),
                                 pd.to_datetime('2016-06-02')],
                     'value' : [1, 2]})

data.set_index('mydate', inplace = True)

data
Out[260]: 
            value
mydate           
2016-06-06      1
2016-06-02      2

I want to iterate over each row so that the dataframe gets "enlarged" by a couple days (2 days before, 2 days after) around each index value (which is a date) for the current row.我想遍历每一行，以便数据框围绕当前行的每个索引值（即日期）“放大”几天（前 2 天，后 2 天）。

For instance, if you consider the first row, I want to tell Pandas to add 4 more rows, corresponding to the days 2016-06-04 , 2016-06-05 , 2016-06-07 and 2016-06-07 .例如，如果您考虑第一行，我想告诉 Pandas 再添加 4 行，分别对应于2016-06-04 、 2016-06-05 、 2016-06-07和2016-06-07 。 The value for these extra rows can just be whathever is in value for that row (in this case: 1).这些额外行的value可以是该行的任何value （在这种情况下：1）。 This logic is applied every row and the final dataframe is the concatenation of all these enlarged dataframes.该逻辑适用于每一行，最终的数据帧是所有这些放大的数据帧的串联。

I have tried the following function in an apply(., axis = 1) :我在apply(., axis = 1)尝试了以下函数：

def expand_onerow(df, ndaysback = 2, nhdaysfwd = 2):

    new_index = pd.date_range(pd.to_datetime(df.name) - pd.Timedelta(days=ndaysback), 
                              pd.to_datetime(df.name) + pd.Timedelta(days=nhdaysfwd), 
                              freq='D')

    newdf = df.reindex(index=new_index, method='nearest')     #New df with expanded index
    return newdf

But unfortunately I running data.apply(lambda x: expand_onerow(x), axis = 1) gives:但不幸的是我运行data.apply(lambda x: expand_onerow(x), axis = 1)给出：

  File "pandas/_libs/tslib.pyx", line 1165, in pandas._libs.tslib._Timestamp.__richcmp__

TypeError: ("Cannot compare type 'Timestamp' with type 'str'", 'occurred at index 2016-06-06 00:00:00')

Another approach I tried is the following: I first reset the index,我尝试的另一种方法如下：我首先重置索引，

data.reset_index(inplace = True)
data
Out[339]: 
      mydate  value
0 2016-06-06      1
1 2016-06-02      2

Then I use a slight modification of my function然后我稍微修改一下我的功能

def expand_onerow_alt(df, ndaysback = 2, nhdaysfwd = 2):

    new_index = pd.date_range(pd.to_datetime(df.mydate) - pd.Timedelta(days=ndaysback), 
                              pd.to_datetime(df.mydate) + pd.Timedelta(days=nhdaysfwd), 
                              freq='D')
    newdf = pd.Series(df).reindex(index = new_index).T  #New df with expanded index
    return newdf

which gives这使

data.apply(lambda x: expand_onerow_alt(x), axis = 1)
Out[338]: 
   2016-05-31  2016-06-01  2016-06-02  2016-06-03  2016-06-04  2016-06-05  2016-06-06  2016-06-07  2016-06-08
0         nan         nan         nan         nan         nan         nan         nan         nan         nan
1         nan         nan         nan         nan         nan         nan         nan         nan         nan

closer but not there yet...更近了，但还没有...

I do not understand what is wrong here.我不明白这里有什么问题。 What am I missing?我错过了什么？ I am looking for the most Pandonic approach here.我在这里寻找最潘多尼的方法。

Thanks!谢谢！

Answer 1

I modify little bit of your function我修改了一点你的功能

def expand_onerow(df, ndaysback = 2, nhdaysfwd = 2):

    new_index = pd.date_range(pd.to_datetime(df.index[0]) - pd.Timedelta(days=ndaysback),
                              pd.to_datetime(df.index[0]) + pd.Timedelta(days=nhdaysfwd),
                              freq='D')

    newdf = df.reindex(index=new_index, method='nearest')     #New df with expanded index
    return newdf

pd.concat([expand_onerow(data.loc[[x],:], ndaysback = 2, nhdaysfwd = 2) for x ,_ in data.iterrows()])


Out[455]: 
            value
2016-05-31      2
2016-06-01      2
2016-06-02      2
2016-06-03      2
2016-06-04      2
2016-06-04      1
2016-06-05      1
2016-06-06      1
2016-06-07      1
2016-06-08      1

More info更多信息

Basically that one line equal to基本上那一行等于

l=[]
for x ,_ in data.iterrows():

    l.append(expand_onerow(data.loc[[x],:], ndaysback = 2, nhdaysfwd = 2))# query out each row by using their index(x is the index for each row) and append then into a empty list


pd.concat(l)# concat the list to one df at the end

扩展数据框中的每一行

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-28 19:17:33

扩展数据框中的每一行

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-28 19:17:33

解决方案1
1 已采纳 2018-02-28 19:17:33