[英]How to filter rows in pandas dataframe where a column has a value equal to some value of a list
I have a dataframe with two columns: one for ID_number and one for week_number. 我有一个包含两列的数据框:一列用于ID_number,另一列用于week_number。 It can look like this:
它看起来像这样:
df1 = pd.DataFrame({'ID_number':[13, 13, 14, 14, 14, 15, 15,16], 'week_number':[1, 2, 1, 2, 3, 1, 4, 5]})
# ID_number week_number
#0 13 1
#1 13 2
#2 14 1
#3 14 2
#4 14 3
#5 15 1
#6 15 4
#7 16 5
I want to select for every different ID, those ID where the week value is 2 and 3 and then make a label for the data. 我想为每个不同的ID选择那些周值为2和3的ID,然后为数据制作标签。 If an ID does not have week 2 AND 3, I label it with a 1. Else, I label it with a 0.
如果ID没有第2周和第3周,我将其标记为1.否则,我将其标记为0。
For now, I came around with a rather non elegant solution, that works, but I am sure that there must be another way: 就目前而言,我找到了一个相当不优雅的解决方案,但是我确信必须有另一种方式:
def check_courier_week(df, field, weeks):
weeks_not_provided = weeks
new_df = df
new_df['label'] = np.zeros(len(df))
for c in np.unique(df[field]):
tmp = df[df[field] == c]
if len(np.unique(tmp.week_number.isin(weeks_not_provided))) == 1 and np.unique(np.unique(tmp.week_number.isin(weeks_not_provided))) == False:
new_df['label'][df[field] == c] = 1
else:
new_df['label'][df[field] == c] = 0
return new_df
Any ideas on how could this be improved? 关于如何改进的任何想法? I guess there might be a solution using groupby, but I cannot think how to implement it.
我想可能有一个使用groupby的解决方案,但我想不出如何实现它。
The resulting label sould be: 得到的标签应该是:
# ID_number week_number label
#0 13 1 0.0
#1 13 2 0.0
#2 14 1 0.0
#3 14 2 0.0
#4 14 3 0.0
#5 15 1 1.0
#6 15 4 1.0
#7 16 5 1.0
Thanks! 谢谢!
Using groupby
with transform
any
使用
groupby
和transform
any
(~(df1['week_number'].isin([2,3])).groupby(df1['ID_number']).transform('any')).astype(int)
Out[39]:
0 0
1 0
2 0
3 0
4 0
5 1
6 1
7 1
Name: week_number, dtype: int32
Using isin
and np.where
without grouping: 使用
isin
和np.where
不进行分组:
unique = df1.loc[df1['week_number'].isin([2,3]), 'ID_number'].unique()
df['label'] = np.where(df1['ID_number'].isin(unique), 0, 1)
Or: 要么:
df['label'] = (~df1['ID_number'].isin(unique)).astype(int)
print(df)
ID_number week_number label
0 13 1 0
1 13 2 0
2 14 1 0
3 14 2 0
4 14 3 0
5 15 1 1
6 15 4 1
7 16 5 1
While not efficient, you can utilize set
operations via set.isdisjoint
: 虽然效率不高,但您可以通过
set.isdisjoint
利用set
操作:
def checker(x):
return set(x).isdisjoint({2, 3})
df1['flag'] = df1.groupby('ID_number')['week_number'].transform(checker)
print(df1)
ID_number week_number flag
0 13 1 0
1 13 2 0
2 14 1 0
3 14 2 0
4 14 3 0
5 15 1 1
6 15 4 1
7 16 5 1
To answer how you could use groupby: You could group by ID_number and then just find a label that way, IE: 要回答你如何使用groupby:你可以按ID_number分组,然后找到一个标签,IE:
df1['label'] = np.zeros(len(df))
grouped_table = df1.groupby('ID_number')
groups = list(set(df1['ID_number']))
for group in groups:
test_list = list(set(grouped_table.getgroup(group)))
if (2 in test_list) & (3 in test_list):
df1.loc[df1['ID_number'] == group]['label'] = 0
else:
df1.loc[df1['ID_number'] == group]['label'] = 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.