简体   繁体   中英

How to delete ANY row containing specific string in pandas?

I know that there are many ways to delete rows containing a specific value in a column in python, but I'm wondering if there is a more efficient way to do this by checking all columns in a dataset at once and deleting all rows that contain a specific value WITHOUT turning it into NaN and dropping all of them. To clarify, I don't want to lose all columns with strings/NaN I just want to lose rows that have a specific value.

For example, I'm looking to delete all rows with participants that contain an answer "refused" in any column. So if my table looked like this:

Subject Race Gender Weight
1 black female 123
2 white refused 145
3 white male 165
4 asian male refused
5 refused male 128
6 white male nan
7 asian male refused
8 black male nan

I would want to implement a statement that would filter it to keep only subjects that didn't have any responses with a string containing "refused":

Subject Race Gender Weight
1 black female 123
3 white male 165
6 white male nan
8 black male nan

Does anyone know how to filter this way across an entire dataset?

You can use isin with any .

df = df[~df.isin(['refused']).any(axis=1)]
df = df[(df.Gender != 'refused') & (df.Race != 'refused').... ]

or alternatively

filter = reduce(lambda column1, column2: (df[column1] != 'refused') & (df[column2] != 'refused'), df.columns)
df = df[filter]

Another method with apply-lambda:

df = df.loc[~df.apply(lambda row : any('refused' in str(cell) for cell in row) ,axis=1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM