简体   繁体   English

从 Pandas DataFrame 中选择一列中具有相同值但另一列中具有不同值的行

[英]Select rows from a Pandas DataFrame with same values in one column but different value in the other column

Say I have the pandas DataFrame below:假设我有下面的 Pandas DataFrame:

   A      B     C   D
1  foo    one   0   0
2  foo    one   2   4
3  foo    two   4   8
4  cat    one   8   4
5  bar    four  6  12
6  bar    three 7  14
7  bar    four  7  14

I would like to select all the rows that have equal values in A but differing values in B. So I would like the output of my code to be:我想选择在 A 中具有相等值但在 B 中具有不同值的所有行。所以我希望我的代码的输出是:

   A      B    C   D
1  foo    one  0   0
3  foo    two  4   8
5  bar  three  7  14
6  bar    four 7  14

What's the most efficient way to do this?执行此操作的最有效方法是什么? I have approximately 11,000 rows with a lot of variation in the column values, but this situation comes up a lot.我有大约 11,000 行,列值有很多变化,但这种情况经常出现。 In my dataset, if elements in column A are equal then the corresponding column B value should also be equal, however due to mislabeling this is not the case and I would like to fix this, it would be impractical for me to do this one by one.在我的数据集中,如果 A 列中的元素相等,那么相应的 B 列值也应该相等,但是由于错误标记,情况并非如此,我想解决这个问题,我这样做是不切实际的一。

You can try groupby() + filter + drop_duplicates() :您可以尝试groupby() + filter + drop_duplicates()

>>> df.groupby('A').filter(lambda g: len(g) > 1).drop_duplicates(subset=['A', 'B'], keep="first")
     A      B  C   D
0  foo    one  0   0
2  foo    two  4   8
4  bar   four  6  12
5  bar  three  7  14

OR, in case you want to drop duplicates between the subset of columns A & B then can use below but that will have the row having cat as well.或者,如果您想删除A列和B列子集之间的重复项,则可以在下面使用,但该行也将包含cat

>>> df.drop_duplicates(subset=['A', 'B'], keep="first")
     A      B  C   D
0  foo    one  0   0
2  foo    two  4   8
3  cat    one  8   4
4  bar   four  6  12
5  bar  three  7  14

Use groupby + filter + head :使用groupby + filter + head

result = df.groupby('A').filter(lambda g: len(g) > 1).groupby(['A', 'B']).head(1)
print(result)

Output输出

     A      B  C   D
0  foo    one  0   0
2  foo    two  4   8
4  bar   four  6  12
5  bar  three  7  14

The first group-by and filter will remove the rows with no duplicated A values (ie cat ), the second will create groups with same A, B and for each of those get the first element.第一个 group-by 和 filter 将删除没有重复A值的行(即cat ),第二个将创建具有相同A, BA, B并为每个组获取第一个元素。

The current answers are correct and may be more sophisticated too.当前的答案是正确的,也可能更复杂。 If you have complex criteria, filter function will be very useful.如果您有复杂的标准, 过滤功能将非常有用。 If you are like me and want to keep things simple, i feel following is more beginner friendly way如果你像我一样想保持简单,我觉得以下是更适合初学者的方式

>>> df = pd.DataFrame({
    'A': ['foo', 'foo', 'foo', 'cat', 'bar', 'bar', 'bar'],
    'B': ['one', 'one', 'two', 'one', 'four', 'three', 'four'],
    'C': [0,2,4,8,6,7,7],
    'D': [0,4,8,4,12,14,14]
}, index=[1,2,3,4,5,6,7])

>>> df = df.drop_duplicates(['A', 'B'], keep='last')
    A       B       C   D
2   foo     one     2   4
3   foo     two     4   8
4   cat     one     8   4
6   bar     three   7   14
7   bar     four    7   14


>>> df = df[df.duplicated(['A'], keep=False)]
    A       B       C   D
2   foo     one     2   4
3   foo     two     4   8
6   bar     three   7   14
7   bar     four    7   14

keep='last' is optional here keep='last'在这里是可选的

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 select 行来自 pandas dataframe 在另一列不同的列中具有相同值并找到平均值并使其成为字典 - select rows from pandas dataframe with same values in one column different on the other &find the average&make it a dictionary 在 pandas 的同一列中将值从一行拆分到其他行 - Splitting value from one row to other rows in the same column in pandas 由于来自不同行的文本值组合在其他 pandas 列中具有相同值,因此创建新的 pandas 行 - Create new pandas row as a result of combination of text values from different rows which has same value in other pandas column 如何将一列中的值传播到其他列中的行(熊猫数据框) - How to propagate values in one column to rows in other columns (pandas dataframe) 按列分组,pandas dataframe 中其他列的 select 特定值 - Groupby by a column and select specifc value from other column in pandas dataframe Pandas数据框根据查询数据框中的值选择行,然后根据列值选择其他条件 - Pandas Dataframe Select rows based on values from a lookup dataframe and then another condition based on column value Pandas:如何使用其他 dataframe 的列值从 dataframe 返回具有相同行值的行? - Pandas: How to return the row from dataframe having same row values by using column value of other dataframe? 如何 Select Pandas Dataframe 的行具有一个具有多个值的列值? - How to Select Rows of Pandas Dataframe with one Column Value which has multiple values? 如何组合 pandas dataframe 中在一列中具有相同值的行 - How to combine rows in a pandas dataframe that have the same value in one column Python:在一列中具有相同值的行的pandas数据框比较 - Python: pandas dataframe comparison of rows with the same value in one column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM