简体   繁体   English

熊猫:如何获取一列的所有值,其中另一列的值是特定值

[英]Pandas: How to get all values for a column, where another column's value is a specific value

I have a dataframe which contains a sample_id and mutation: Each sample contains several mutations 我有一个包含sample_id和突变的数据框:每个样品都包含几个突变

sample_id    mutation
sample1      mutation_A
sample1      mutation_B
sample1      mutation_D

sample2      mutation_C
sample2      mutation_D

sample3      mutation_A
sample3      mutation_B
sample3      mutation_C

I want to be able to obtain the values where say, mutation_C exists. 我希望能够获得说存在mutation_C的值。 However I want to get all the results out for that sample - 但是,我想获取该样本的所有结果-

df.loc[(df[mutation] == 'mutation_C')]

returns: 收益:

sample_id    mutation
sample2      mutation_C

How do I get the rest of sample2 mutation data, so: 我如何获取其余的sample2突变数据,所以:

sample_id    mutation
sample2      mutation_C
sample2      mutation_D

I have been trying to use grouopby but can't figure out how to obtain all the results 我一直在尝试使用grouopby,但无法弄清楚如何获得所有结果

First filter all samples and then filter again by isin : 首先过滤所有samples ,然后通过isin再次过滤:

a = df.loc[df['mutation'] == 'mutation_C', 'sample_id']
df = df[df['sample_id'].isin(a)]
print (a)

3    sample2
7    sample3
Name: sample_id, dtype: object

df = df[df['sample_id'].isin(a)]
print (df)
  sample_id    mutation
3   sample2  mutation_C
4   sample2  mutation_D
5   sample3  mutation_A
6   sample3  mutation_B
7   sample3  mutation_C

Assuming you have other data, a neater idea would be to set the index the way you are after. 假设您还有其他数据,那么一个更整洁的想法是按照您的方式设置索引。 (I've added a dummy column with df['value'] = 1 ) (我添加了一个df['value'] = 1的虚拟列)

>>> a = df.set_index(['mutation', 'sample_id'])
>>> a.sort_index()
                      value
mutation   sample_id       
mutation_A sample1        1
           sample3        1
mutation_B sample1        1
           sample3        1
mutation_C sample2        1
           sample3        1
mutation_D sample1        1
           sample2        1
>>> a.loc['mutation_C']
               value
sample_id       
sample2        1
sample3        1

If you really need the sample_ids as a list then you could do: 如果您确实需要sample_ids作为列表,则可以执行以下操作:

>>> a.loc['mutation_C'].index.tolist()
['sample2', 'sample3']

Not what you asked but perhaps another useful view: 不是您问的,而是另一个有用的观点:

>>> df.pivot_table(values='value', index='sample_id', columns='mutation')
mutation   mutation_A  mutation_B  mutation_C  mutation_D
sample_id                                                
sample1           1.0         1.0         NaN         1.0
sample2           NaN         NaN         1.0         1.0
sample3           1.0         1.0         1.0         NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:如何删除列值与特定值匹配的行(所有值都是值列表) - Pandas : How to drop a row where column values match with a specific value (all value are list of value) 如何更新另一列值与特定值匹配的一列行的值? - How to update the values of one column's row where another column value matches specific value? Python Pandas:如何更改值包含特定单词的列中的所有值 - Python Pandas: How to change all the values in the column where the value contains a specific word 如何将一列除以另一列,其中一个数据帧的列值对应于 Python Pandas 中另一个数据帧的列值? - How to divide one column by another where one dataframe's column value corresponds to another dataframe's column's value in Python Pandas? 计算pandas列中值的频率,其中另一列中的值相似 - Count frequency of value in pandas column where values in another column are similar 替换与另一列 pandas 中的特定值相对应的列中的空值 - Replace null values in a column corresponding to specific value in another column pandas 使用 Pandas 将特定列值替换为另一个数据框列值 - Replace specific column values with another dataframe column value using Pandas Pandas groupby 获取另一列最小的列的值 - Pandas groupby get value of a column where another column is minimum 熊猫-将所有列中的特定值替换为另一列中的对应值 - Pandas - Replacing a specific value in all columns with the corresponding value in another column 如果 pandas 列值等于另一列的值,则更新它们 - Update pandas column values if they are equal to another column's value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM