[英]Pandas : How to drop a row where column values match with a specific value (all value are list of value)
[英]Pandas: How to get all values for a column, where another column's value is a specific value
我有一个包含sample_id和突变的数据框:每个样品都包含几个突变
sample_id mutation
sample1 mutation_A
sample1 mutation_B
sample1 mutation_D
sample2 mutation_C
sample2 mutation_D
sample3 mutation_A
sample3 mutation_B
sample3 mutation_C
我希望能够获得说存在mutation_C的值。 但是,我想获取该样本的所有结果-
df.loc[(df[mutation] == 'mutation_C')]
收益:
sample_id mutation
sample2 mutation_C
我如何获取其余的sample2突变数据,所以:
sample_id mutation
sample2 mutation_C
sample2 mutation_D
我一直在尝试使用grouopby,但无法弄清楚如何获得所有结果
首先过滤所有samples
,然后通过isin
再次过滤:
a = df.loc[df['mutation'] == 'mutation_C', 'sample_id']
df = df[df['sample_id'].isin(a)]
print (a)
3 sample2
7 sample3
Name: sample_id, dtype: object
df = df[df['sample_id'].isin(a)]
print (df)
sample_id mutation
3 sample2 mutation_C
4 sample2 mutation_D
5 sample3 mutation_A
6 sample3 mutation_B
7 sample3 mutation_C
假设您还有其他数据,那么一个更整洁的想法是按照您的方式设置索引。 (我添加了一个df['value'] = 1
的虚拟列)
>>> a = df.set_index(['mutation', 'sample_id'])
>>> a.sort_index()
value
mutation sample_id
mutation_A sample1 1
sample3 1
mutation_B sample1 1
sample3 1
mutation_C sample2 1
sample3 1
mutation_D sample1 1
sample2 1
>>> a.loc['mutation_C']
value
sample_id
sample2 1
sample3 1
如果您确实需要sample_ids作为列表,则可以执行以下操作:
>>> a.loc['mutation_C'].index.tolist()
['sample2', 'sample3']
不是您问的,而是另一个有用的观点:
>>> df.pivot_table(values='value', index='sample_id', columns='mutation')
mutation mutation_A mutation_B mutation_C mutation_D
sample_id
sample1 1.0 1.0 NaN 1.0
sample2 NaN NaN 1.0 1.0
sample3 1.0 1.0 1.0 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.