[英]pandas dataframe filtering multiple columns and rows
给定具有以下格式的数据框:
TEST_ID | ATOMIC_NUMBER | COMPOSITION_PERCENT | POSITION
1 | 28 | 49.84 | 0
1 | 22 | 50.01 | 0
1 | 47 | 0.06 | 1
2 | 22 | 49.84 | 0
2 | 47 | 50.01 | 1
3 | 28 | 49.84 | 0
3 | 22 | 50.01 | 0
3 | 47 | 0.06 | 0
我只想选择在 POSITION 0 中 ATOMIC_NUMBER 为 22 和 28 的测试,不多不少。 所以我想要一个返回的过滤器:
TEST_ID | ATOMIC_NUMBER | COMPOSITION_PERCENT | POSITION
1 | 28 | 49.84 | 0
1 | 22 | 50.01 | 0
1 | 47 | 0.06 | 1
编辑:我正在尝试将此逻辑从 SQL 转换为 python。 这是 SQL 代码:
select * from compositions
where compositions.test_id in (
select a.test_id from (
select test_id from compositions
where test_id in (
select test_id from (
select * from COMPOSITIONS where position == 0 )
group by test_id
having count(test_id) = 2 )
and atomic_number = 22) a
join (
select test_id from compositions
where test_id in (
select test_id from (
select * from COMPOSITIONS where position == 0 )
group by test_id
having count(test_id) = 2 )
and atomic_number = 28) b
on a.test_id = b.test_id )
您可以创建一个布尔系列来捕获 test_ids,然后使用相同的索引对 df 进行索引。
s = df[df['POSITION'] == 0].groupby('TEST_ID').apply(lambda x: ((x['ATOMIC_NUMBER'].count() == 2 ) & (sorted(x['ATOMIC_NUMBER'].values.tolist()) == [22,28])).all())
test_id = s[s].index.tolist()
df[df['TEST_ID'].isin(test_id)]
TEST_ID ATOMIC_NUMBER COMPOSITION_PERCENT POSITION
0 1 28 49.84 0
1 1 22 50.01 0
2 1 47 0.06 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.