How can I remove all rows in a dataframe if a row contains '9999-Don't Know' in any column?
I have been able to find solutions that delete rows based on format of value (string, numerical, etc.) in the entire dataframe, or delete rows based on values in a specific column, or delete rows from a dataframe that has few columns by using their names.
This is the closest thing I found but this solution doesn't work for me because I cannot enter all the column names due to sheer volume (76+ columns).
Below is a sample dataset
pd.DataFrame.from_items([('RespondentId', ['1ghi3g','335hduu','4vlsiu4','5nnvkkt','634deds','7kjng']), ('Satisfaction - Timing', ['9-Excellent','9-Excellent','9999-Don\'t Know','8-Very Good','1-Very Unsatisfied','9999-Don\'t Know']),('Response Speed - Time',['9999-Don\'t Know','9999-Don\'t Know','9-Excellent','9-Excellent','9-Excellent','9-Excellent'])])
After removing the 4 rows that contain '9999-Don't Know', the output should look like this so I can write a new Excel file with the cleaned up data.
pd.DataFrame.from_items([('RespondentId', ['5nnvkkt','634deds']), ('Satisfaction - Timing', ['8-Very Good','1-Very Unsatisfied']),('Response Speed - Time',['9-Excellent','9-Excellent'])])
Use
In [677]: df[~(df == "9999-Don't Know").any(axis=1)]
Out[677]:
RespondentId Satisfaction - Timing Response Speed - Time
3 5nnvkkt 8-Very Good 9-Excellent
4 634deds 1-Very Unsatisfied 9-Excellent
Or
In [683]: df[(df != "9999-Don't Know").all(axis=1)]
Out[683]:
RespondentId Satisfaction - Timing Response Speed - Time
3 5nnvkkt 8-Very Good 9-Excellent
4 634deds 1-Very Unsatisfied 9-Excellent
Same as
In [686]: df[~df.eq("9999-Don't Know").any(axis=1)]
Out[686]:
RespondentId Satisfaction - Timing Response Speed - Time
3 5nnvkkt 8-Very Good 9-Excellent
4 634deds 1-Very Unsatisfied 9-Excellent
Or
In [687]: df[df.ne("9999-Don't Know").all(axis=1)]
Out[687]:
RespondentId Satisfaction - Timing Response Speed - Time
3 5nnvkkt 8-Very Good 9-Excellent
4 634deds 1-Very Unsatisfied 9-Excellent
With mixed column types, see @PiR's comment df.astype(object)
In [695]: df[df.astype(object).ne("9999-Don't Know").all(axis=1)]
Out[695]:
RespondentId Satisfaction - Timing Response Speed - Time
3 5nnvkkt 8-Very Good 9-Excellent
4 634deds 1-Very Unsatisfied 9-Excellent
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.