I have some values in the risk
column that are neither, Small
, Medium
or High
. I want to delete the rows with the value not being Small
, Medium
and High
. I tried the following:
df = df[(df.risk == "Small") | (df.risk == "Medium") | (df.risk == "High")]
But this returns an empty DataFrame. How can I filter them correctly?
I think you want:
df = df[(df.risk.isin(["Small","Medium","High"]))]
Example:
In [5]:
import pandas as pd
df = pd.DataFrame({'risk':['Small','High','Medium','Negligible', 'Very High']})
df
Out[5]:
risk
0 Small
1 High
2 Medium
3 Negligible
4 Very High
[5 rows x 1 columns]
In [6]:
df[df.risk.isin(['Small','Medium','High'])]
Out[6]:
risk
0 Small
1 High
2 Medium
[3 rows x 1 columns]
You could also use query
:
df.query('risk in ["Small","Medium","High"]')
You can refer to variables in the environment by prefixing them with @
. For example:
lst = ["Small","Medium","High"]
df.query("risk in @lst")
If the column name is multiple words, eg "risk factor"
, you can refer to it by surrounding it with backticks ` `
:
df.query('`risk factor` in @lst')
query
method comes in handy if you need to chain multiple conditions. For example, the outcome of the following filter:
df[df['risk factor'].isin(lst) & (df['value']**2 > 2) & (df['value']**2 < 5)]
can be derived using the following expression:
df.query('`risk factor` in @lst and 2 < value**2 < 5')
Another nice and readable approach is the following:
small_risk = df["risk"] == "Small"
medium_risk = df["risk"] == "Medium"
high_risk = df["risk"] == "High"
Then you can use it like this:
df[small_risk | medium_risk | high_risk]
or
df[small_risk & medium_risk]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.