I have a Pandas.DataFrame with a standard index representing seconds, and I want to add a column "seconds elapsed since last event" where the events are given in a list. Specifically, say
event = [2, 5]
and
df = pd.DataFrame(np.zeros((7, 1)))
| | 0 |
|---:|----:|
| 0 | 0 |
| 1 | 0 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
Then I want to obtain
| | 0 | x |
|---:|----:|-----:|
| 0 | 0 | <NA> |
| 1 | 0 | <NA> |
| 2 | 0 | 0 |
| 3 | 0 | 1 |
| 4 | 0 | 2 |
| 5 | 0 | 0 |
| 6 | 0 | 1 |
I tried
df["x"] = pd.Series(range(5)).shift(2)
| | 0 | x |
|---:|----:|----:|
| 0 | 0 | nan |
| 1 | 0 | nan |
| 2 | 0 | 0 |
| 3 | 0 | 1 |
| 4 | 0 | 2 |
| 5 | 0 | nan |
| 6 | 0 | nan |
so apparently to make it work I need to write df["x"] = pd.Series(range(5+2)).shift(2)
.
More importantly, when I then do df["x"] = pd.Series(range(2+5)).shift(5)
I obtain
| | 0 | x |
|---:|----:|----:|
| 0 | 0 | nan |
| 1 | 0 | nan |
| 2 | 0 | nan |
| 3 | 0 | nan |
| 4 | 0 | nan |
| 5 | 0 | 0 |
| 6 | 0 | 1 |
That is: the previous has been overwritten. Is there a way to assign new values without overwriting existing values by nan ? Then, I can do something like
for i in event:
df["x"] = pd.Series(range(len(df))).shift(i)
Or is there a more efficient way ?
For the record, here is my naive code. It works, but looks inefficient and of poor design:
c = 1000000
df["x"] = c
if event:
idx = 0
for i in df.itertuples():
print(i)
if idx < len(event) and i.Index == event[idx]:
c = 0
idx += 1
df.loc[i.Index, "x"] = c
c += 1
return df
Let's try this:
df = pd.DataFrame(np.zeros((7, 1)))
event = [2, 5]
df.loc[event, 0] = 1
df = df.replace(0, np.nan)
grp=df[0].cumsum().ffill()
df['x'] = df.groupby(grp).cumcount().mask(grp.isna())
df
Output:
| | 0 | x |
|---:|----:|----:|
| 0 | nan | nan |
| 1 | nan | nan |
| 2 | 1 | 0 |
| 3 | nan | 1 |
| 4 | nan | 2 |
| 5 | 1 | 0 |
| 6 | nan | 1 |
IIUC, you can do double groupby:
s = df.index.isin(event).cumsum()
# or equivalently
# s = df.loc[event, 0].reindex(df.index).isna().cumsum()
df['x'] = np.where(s>0,df.groupby(s).cumcount(), np.nan)
Output:
0 x
0 0.0 NaN
1 0.0 NaN
2 0.0 0.0
3 0.0 1.0
4 0.0 2.0
5 0.0 0.0
6 0.0 1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.