[英]expanding each row in a dataframe
Consider this simple example考虑这个简单的例子
data = pd.DataFrame({'mydate' : [pd.to_datetime('2016-06-06'),
pd.to_datetime('2016-06-02')],
'value' : [1, 2]})
data.set_index('mydate', inplace = True)
data
Out[260]:
value
mydate
2016-06-06 1
2016-06-02 2
I want to iterate over each row so that the dataframe gets "enlarged" by a couple days (2 days before, 2 days after) around each index value (which is a date) for the current row.我想遍历每一行,以便数据框围绕当前行的每个索引值(即日期)“放大”几天(前 2 天,后 2 天)。
For instance, if you consider the first row, I want to tell Pandas to add 4 more rows, corresponding to the days 2016-06-04
, 2016-06-05
, 2016-06-07
and 2016-06-07
.例如,如果您考虑第一行,我想告诉 Pandas 再添加 4 行,分别对应于
2016-06-04
、 2016-06-05
、 2016-06-07
和2016-06-07
。 The value
for these extra rows can just be whathever is in value
for that row (in this case: 1).这些额外行的
value
可以是该行的任何value
(在这种情况下:1)。 This logic is applied every row and the final dataframe is the concatenation of all these enlarged dataframes.该逻辑适用于每一行,最终的数据帧是所有这些放大的数据帧的串联。
I have tried the following function in an apply(., axis = 1)
:我在
apply(., axis = 1)
尝试了以下函数:
def expand_onerow(df, ndaysback = 2, nhdaysfwd = 2):
new_index = pd.date_range(pd.to_datetime(df.name) - pd.Timedelta(days=ndaysback),
pd.to_datetime(df.name) + pd.Timedelta(days=nhdaysfwd),
freq='D')
newdf = df.reindex(index=new_index, method='nearest') #New df with expanded index
return newdf
But unfortunately I running data.apply(lambda x: expand_onerow(x), axis = 1)
gives:但不幸的是我运行
data.apply(lambda x: expand_onerow(x), axis = 1)
给出:
File "pandas/_libs/tslib.pyx", line 1165, in pandas._libs.tslib._Timestamp.__richcmp__
TypeError: ("Cannot compare type 'Timestamp' with type 'str'", 'occurred at index 2016-06-06 00:00:00')
Another approach I tried is the following: I first reset the index,我尝试的另一种方法如下:我首先重置索引,
data.reset_index(inplace = True)
data
Out[339]:
mydate value
0 2016-06-06 1
1 2016-06-02 2
Then I use a slight modification of my function然后我稍微修改一下我的功能
def expand_onerow_alt(df, ndaysback = 2, nhdaysfwd = 2):
new_index = pd.date_range(pd.to_datetime(df.mydate) - pd.Timedelta(days=ndaysback),
pd.to_datetime(df.mydate) + pd.Timedelta(days=nhdaysfwd),
freq='D')
newdf = pd.Series(df).reindex(index = new_index).T #New df with expanded index
return newdf
which gives这使
data.apply(lambda x: expand_onerow_alt(x), axis = 1)
Out[338]:
2016-05-31 2016-06-01 2016-06-02 2016-06-03 2016-06-04 2016-06-05 2016-06-06 2016-06-07 2016-06-08
0 nan nan nan nan nan nan nan nan nan
1 nan nan nan nan nan nan nan nan nan
closer but not there yet...更近了,但还没有...
I do not understand what is wrong here.我不明白这里有什么问题。 What am I missing?
我错过了什么? I am looking for the most Pandonic approach here.
我在这里寻找最潘多尼的方法。
Thanks!谢谢!
I modify little bit of your function我修改了一点你的功能
def expand_onerow(df, ndaysback = 2, nhdaysfwd = 2):
new_index = pd.date_range(pd.to_datetime(df.index[0]) - pd.Timedelta(days=ndaysback),
pd.to_datetime(df.index[0]) + pd.Timedelta(days=nhdaysfwd),
freq='D')
newdf = df.reindex(index=new_index, method='nearest') #New df with expanded index
return newdf
pd.concat([expand_onerow(data.loc[[x],:], ndaysback = 2, nhdaysfwd = 2) for x ,_ in data.iterrows()])
Out[455]:
value
2016-05-31 2
2016-06-01 2
2016-06-02 2
2016-06-03 2
2016-06-04 2
2016-06-04 1
2016-06-05 1
2016-06-06 1
2016-06-07 1
2016-06-08 1
More info更多信息
Basically that one line equal to基本上那一行等于
l=[]
for x ,_ in data.iterrows():
l.append(expand_onerow(data.loc[[x],:], ndaysback = 2, nhdaysfwd = 2))# query out each row by using their index(x is the index for each row) and append then into a empty list
pd.concat(l)# concat the list to one df at the end
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.