[英]selecting rows based on multiple column values in pandas dataframe
I have a pandas
DataFrame
df
:我有一个pandas
DataFrame
df
:
import pandas as pd
data = {"Name": ["AAAA", "BBBB"],
"C1": [25, 12],
"C2": [2, 1],
"C3": [1, 10]}
df = pd.DataFrame(data)
df.set_index("Name")
which looks like this when printed (for reference):打印时看起来像这样(供参考):
C1 C2 C3
Name
AAAA 25 2 1
BBBB 12 1 10
I would like to choose rows for which C1
, C2
and C3
have values between 0
and 20
.我想选择C1
、 C2
和C3
具有介于0
和20
之间的值的行。
Can you suggest an elegant way to select those rows?你能建议一种优雅的方式来选择这些行吗?
我认为下面应该这样做,但它的优雅有待商榷。
new_df = old_df[((old_df['C1'] > 0) & (old_df['C1'] < 20)) & ((old_df['C2'] > 0) & (old_df['C2'] < 20)) & ((old_df['C3'] > 0) & (old_df['C3'] < 20))]
Shorter version:较短的版本:
In [65]:
df[(df>=0)&(df<=20)].dropna()
Out[65]:
Name C1 C2 C3
1 BBBB 12 1 10
我喜欢用 df.query() 做这些事情
df.query('C1>=0 and C1<=20 and C2>=0 and C2<=20 and C3>=0 and C3<=20')
df.query(
"0 < C1 < 20 and 0 < C2 < 20 and 0 < C3 < 20"
)
or或者
df.query("0 < @df < 20").dropna()
Using @foo
in df.query
refers to the variable foo
in the environment.在df.query
使用@foo
指的是环境中的变量foo
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.