简体   繁体   中英

Filter rows from dataframe with matching pairs of strings [str.contains() AND Operation - Python, Pandas ]

I need to filter rows from a dataframe that include matching pairs of strings. For example if the below instance when filtered only data for IDs 1 and 2 would be kept as 3 does not have a corresponding '3 Month' for the '0 Month' entry:

df = pd.DataFrame({'ID':[1,2,3,1,2,1], 'Period':['0 Month','0 Month','0 Month','3 Month','3 Month','6 Month']})

The OR operation can easily be used to filter for 2 strings, as below, but that does not drop the ID without the requisite pair.

df = df[(df["Period"].str.contains("0 Month")) | (df["Period"].str.contains("3 Month"))] 
df

Therefore I'm attempting to use the AND operator to address this need but that is returning an empty dataframe:

df = df[(df["Period"].str.contains("0 Month")) & (df["Period"].str.contains("3 Month"))] 
df

You can groupby "ID" and the condition and transform nunique method to count the number of unique "Period"s and filter the rows with more than 1 unique "Period" values:

out = df[df.groupby(['ID', (df["Period"].str.contains("0 Month") | df["Period"].str.contains("3 Month"))])['Period'].transform('nunique') > 1]

Note that, instead of | you can use isin :

out = df[df.groupby(['ID', df["Period"].isin(['0 Month', '3 Month'])])['Period'].transform('nunique') > 1]

or combine the strings to match inside str.contains :

out = df[df.groupby(['ID', df["Period"].str.contains('0|3')])['Period'].transform('nunique') > 1]

Output:

   ID   Period
0   1  0 Month
1   2  0 Month
3   1  3 Month
4   2  3 Month

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM