[英]Find list of indices before specific value in column for each unique id
I have dataframe like this:我有这样的 dataframe :
id | date | status
________________________
... ... ...
1 |2020-01-01 | reserve
1 |2020-01-02 | sold
2 |2020-01-01 | free
3 |2020-01-03 | reserve
3 |2020-01-25 | signed
3 |2020-01-30 | sold
... ... ...
10 |2020-01-02 | signed
10 |2020-02-15 | sold
... ... ....
I want to find indices of all rows with status sold
and then for all rows back 29 days of these (rows with status sold
) assign 1
and 0
in else case.我想找到所有状态为
sold
的行的索引,然后为所有这些行返回 29 天(状态为sold
的行)在其他情况下分配1
和0
。
The desired dataframe is this想要的 dataframe 是这个
id | date | status | label
_________________________________
... ... ... ...
1 |2020-01-01 | reserve | 1
1 |2020-01-02 | sold | 1
2 |2019-12-02 | free | 0 # no sold status for 2
3 |2020-01-03 | reserve | 1
3 |2020-01-25 | signed | 1
3 |2020-01-30 | sold | 1
... ... ... ...
10 |2020-01-02 | signed | 0
10 |2020-02-15 | sold | 1 # more than 29 days from 2020-02-15
... ... .... ...
I attemped to use apply()
, but I found out I can't call function like that我尝试使用
apply()
,但我发现我不能像那样调用 function
def make_labels(df):
def get_indices(df):
return list(df[df['date'] >= df.iloc[-1]['date'] - timedelta(days=29)].index)
df.sort_values(['id', 'date'], inplace=True)
zero_labels = pd.Series(0, index = df.index, name='sold_labels')
one_lables = df.groupby('id')['status'].apply(lambda s: get_indices if s.iloc[-1] == 'sold').sum()
zero_labels.loc[one_lables] = 1
return zero_labels
df['label'] = make_labels(df)
dataframe constructor of the input: dataframe 构造函数的输入:
d = {'id': [1, 1, 2, 3, 3, 3, 10, 10],
'date': ['2020-01-01', '2020-01-02', '2020-01-01', '2020-01-03', '2020-01-25', '2020-01-30', '2020-01-02', '2020-02-15'],
'status': ['reserve', 'sold', 'free', 'reserve', 'signed', 'sold', 'signed', 'sold']
}
df = pd.DataFrame(data=d)
You can use groupby.transform
to get the sold date per group, then compare the difference to 29 days:您可以使用
groupby.transform
获取每组的销售日期,然后将差异与 29 天进行比较:
df['date'] = pd.to_datetime(df['date'])
ref = (df['date']
.where(df['status'].eq('sold'))
.groupby(df['id'])
.transform('first')
)
df['label'] = (df['date'].rsub(ref)
.le('29days')
.astype(int)
)
Output: Output:
id date status label
0 1 2020-01-01 reserve 1
1 1 2020-01-02 sold 1
2 2 2020-01-01 free 0
3 3 2020-01-03 reserve 1
4 3 2020-01-25 signed 1
5 3 2020-01-30 sold 1
6 10 2020-01-02 signed 0
7 10 2020-02-15 sold 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.