简体   繁体   English

如何过滤pandas数据框中的行,其中列的值等于列表的某个值

[英]How to filter rows in pandas dataframe where a column has a value equal to some value of a list

I have a dataframe with two columns: one for ID_number and one for week_number. 我有一个包含两列的数据框:一列用于ID_number,另一列用于week_number。 It can look like this: 它看起来像这样:

df1 = pd.DataFrame({'ID_number':[13, 13, 14, 14, 14, 15, 15,16], 'week_number':[1, 2, 1, 2, 3, 1, 4, 5]})

#   ID_number   week_number
#0  13  1
#1  13  2
#2  14  1
#3  14  2
#4  14  3
#5  15  1
#6  15  4
#7  16  5

I want to select for every different ID, those ID where the week value is 2 and 3 and then make a label for the data. 我想为每个不同的ID选择那些周值为2和3的ID,然后为数据制作标签。 If an ID does not have week 2 AND 3, I label it with a 1. Else, I label it with a 0. 如果ID没有第2周和第3周,我将其标记为1.否则,我将其标记为0。

For now, I came around with a rather non elegant solution, that works, but I am sure that there must be another way: 就目前而言,我找到了一个相当不优雅的解决方案,但是我确信必须有另一种方式:

def check_courier_week(df, field, weeks):
    weeks_not_provided = weeks
    new_df = df
    new_df['label'] = np.zeros(len(df))
    for c in np.unique(df[field]):
        tmp = df[df[field] == c]
        if len(np.unique(tmp.week_number.isin(weeks_not_provided))) == 1 and np.unique(np.unique(tmp.week_number.isin(weeks_not_provided))) == False:
            new_df['label'][df[field] == c] = 1
        else:
            new_df['label'][df[field] == c] = 0
    return new_df

Any ideas on how could this be improved? 关于如何改进的任何想法? I guess there might be a solution using groupby, but I cannot think how to implement it. 我想可能有一个使用groupby的解决方案,但我想不出如何实现它。

The resulting label sould be: 得到的标签应该是:

#   ID_number   week_number     label
#0  13  1   0.0
#1  13  2   0.0
#2  14  1   0.0
#3  14  2   0.0
#4  14  3   0.0
#5  15  1   1.0
#6  15  4   1.0
#7  16  5   1.0

Thanks! 谢谢!

Using groupby with transform any 使用groupbytransform any

(~(df1['week_number'].isin([2,3])).groupby(df1['ID_number']).transform('any')).astype(int)
Out[39]: 
0    0
1    0
2    0
3    0
4    0
5    1
6    1
7    1
Name: week_number, dtype: int32

Using isin and np.where without grouping: 使用isinnp.where不进行分组:

unique = df1.loc[df1['week_number'].isin([2,3]), 'ID_number'].unique()
df['label'] = np.where(df1['ID_number'].isin(unique), 0, 1)

Or: 要么:

df['label'] = (~df1['ID_number'].isin(unique)).astype(int)

print(df)
   ID_number  week_number  label
0         13            1      0
1         13            2      0
2         14            1      0
3         14            2      0
4         14            3      0
5         15            1      1
6         15            4      1
7         16            5      1

While not efficient, you can utilize set operations via set.isdisjoint : 虽然效率不高,但您可以通过set.isdisjoint利用set操作:

def checker(x):
    return set(x).isdisjoint({2, 3})

df1['flag'] = df1.groupby('ID_number')['week_number'].transform(checker)

print(df1)

   ID_number  week_number  flag
0         13            1     0
1         13            2     0
2         14            1     0
3         14            2     0
4         14            3     0
5         15            1     1
6         15            4     1
7         16            5     1

To answer how you could use groupby: You could group by ID_number and then just find a label that way, IE: 要回答你如何使用groupby:你可以按ID_number分组,然后找到一个标签,IE:

df1['label'] = np.zeros(len(df))
grouped_table = df1.groupby('ID_number')
groups = list(set(df1['ID_number']))
for group in groups:
    test_list = list(set(grouped_table.getgroup(group)))
    if (2 in test_list) & (3 in test_list):
        df1.loc[df1['ID_number'] == group]['label'] = 0  
    else:
        df1.loc[df1['ID_number'] == group]['label'] = 1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas Dataframe 如何获取列等于特定值的多行 - Pandas Dataframe how to get multiple rows where a column is equal to a specific value 筛选 pandas dataframe 行,其中具有列 A 和值 X 的特定行具有列 B,其值 Y 大于参数 Z - Filter pandas dataframe rows where a specific row with column A and value X has column B with value Y greater than a parameter Z 如何基于熊猫数据框中的行值的某些部分进行过滤 - How to filter based on the some part of a value of rows in a pandas dataframe 如何过滤 Dataframe(df) 中 df[list] 的所有值都小于特定浮点值的行 - How to filter rows in a Dataframe(df) where df[list] has all values less than specific float value 如何从包含某个值的列表的 Pandas 数据框列中获取行 - How to fetch rows from pandas dataframe column having list filed with some value Python Pandas在列等于值的特定行中将数据框写入CSV - Python pandas Write dataframe to CSV in specific rows where column is equal to value 如何过滤 dataframe 中的值,其中包含字符串列表作为值的列? - How to filter the values in a dataframe where a column having list of strings as value? 拆分 Pandas Dataframe,每个列值的行数相等 - Split Pandas Dataframe With Equal Amount of Rows for each Column Value 过滤只有一列具有值的行并创建列以在熊猫中附加记分卡 - Filter rows where only one of the column has a value and create column to append scorecard in pandas 如果列值中包含列表值,则在列上过滤 Dataframe。 Pandas - Filter a Dataframe on a column, if a list value is contained in the column value. Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM