使用列组合查找行熊猫中的数据不匹配

Question

What's the best way to get all cell values based on a combination of column values? 基于列值的组合来获取所有单元格值的最佳方法是什么？

Sample dataframe One: 示例数据帧一：

  Stock                         Name  Price
0    AMD       Advanced Micro Devices    100
1     GE     General Electric Company    200
2    BAC  Bank of America Corporation    300
3   AAPL                   Apple Inc.    500
4   MSFT        Microsoft Corporation   1000
5  GOOGL                Alphabet Inc.   2000

Sample dataframe Two: 示例数据框二：

  Stock                         Name  Price
0    AMD       Advanced Micro Devices    100
1     GE     General Electric Company    200
2    BAC  Branch of America Corporation  300
3   AAPL                   Apple Inc.    500
4   MSFT        Microsoft Corporation   1000
5  GOOGL                Alphabet Inc.   2000

For example: I want to use (Stock and Name) as key columns and then compare the datasets. 例如：我想使用（库存和名称）作为关键列，然后比较数据集。 The goal is to print the mismatch entries between the two datasets with the Stock+Name columns used as a combination key. 目的是使用Stock + Name列作为组合键来打印两个数据集之间的不匹配项。

I'm using Pandas/Python3.7 我正在使用Pandas / Python3.7

Sample Output: 样本输出：

BAC Bank of America Corporation 300 --- BAC Branch of America Corporation 300 BAC美国银行公司300 ---美国公司300 BAC分行

Answer 1

Perhaps, a FULL INNER JOIN using merge + query ? 也许，使用merge + query的FULL INNER JOIN？

df1.merge(df2, on='Stock').query('Name_x != Name_y')

  Stock                       Name_x  Price_x                         Name_y  Price_y
2   BAC  Bank of America Corporation      300  Branch of America Corporation      300

Or, a slightly different solution with map , you can use to get the stock symbols: 或者，与map稍有不同的解决方案，您可以用来获取股票代码：

m = df1.Stock.map(df2.set_index('Stock').Name).ne(df1.Name)
symbols = df1.loc[m, 'Stock']

print(symbols)
2    BAC
Name: Stock, dtype: object

And then access each DataFrame row by stock symbol: 然后按库存代码访问每个DataFrame行：

df1[df1.Stock.isin(symbols)]
  Stock                         Name  Price
2   BAC  Bank of America Corporation    300

df2[df2.Stock.isin(symbols)]
  Stock                           Name  Price
2   BAC  Branch of America Corporation    300

Answer 2

If they are in two dataframes, merging them without condition is pretty straightforward with .concat . 如果它们在两个数据帧中，则使用.concat合并非常简单。 Once they are joined, here's one way to get the mismatch: 一旦加入，这是解决不匹配的一种方法：

import pandas as pd

df1 = pd.DataFrame({
    "Ticker_y": list("qwerty"),
    "Name_y": list("asdfgh"),
    "Ticker_x": list("qw3r7y"),
    "Name_x": list("as6f8h")
})

mismatch = df1[(df1["Ticker_y"] != df1["Ticker_x"]) & (df1["Name_y"] != df1["Name_x"])]

The last line just says "the df only where these conditions are met." 最后一行只是说“只有在满足这些条件的情况下，df”。

Answer 3

We can use isin using the sequence of values to test as it ensures each element in the DataFrame is contained in values 我们可以使用isin使用值序列进行测试，因为它可以确保DataFrame中的每个元素都包含在值中

First DataFrame 第一个数据框

>>> df1
   Stock                         Name  Price
0    AMD       Advanced Micro Devices    100
1     GE     General Electric Company    200
2    BAC  Bank of America Corporation    300
3   APPL                   Apple Inc.    500
4   MSFT        Microsoft Corporation   1000
5  GOOGL                Alphabet Inc.   2000

Second DataFrame 第二个DataFrame

>>> df2
   Stock                           Name  Price
0    AMD         Advanced Micro Devices    100
1     GE       General Electric Company    200
2    BAC  Branch of America Corporation    300
3   APPL                     Apple Inc.    500
4   MSFT          Microsoft Corporation   1000
5  GOOGL                  Alphabet Inc.   2000

Here you can go.. 在这里你可以去..

>>> df2[~df2.Name.isin(df1.Name.values)]
  Stock                           Name  Price
2   BAC  Branch of America Corporation    300

OR 要么

>>> df1[~df1.Name.isin(df2.Name.values)]
  Stock                         Name  Price
2   BAC  Bank of America Corporation    300

使用列组合查找行熊猫中的数据不匹配

问题描述

3 个解决方案

解决方案1
1 2018-11-20 21:45:43

解决方案2
0 2018-11-20 21:52:33

解决方案3
0 2018-11-21 04:23:54

使用列组合查找行熊猫中的数据不匹配

问题描述

3 个解决方案

解决方案1 1 2018-11-20 21:45:43

解决方案2 0 2018-11-20 21:52:33

解决方案3 0 2018-11-21 04:23:54

解决方案1
1 2018-11-20 21:45:43

解决方案2
0 2018-11-20 21:52:33

解决方案3
0 2018-11-21 04:23:54