简体   繁体   English

使用列组合查找行熊猫中的数据不匹配

[英]Use column combinations to find data mismatch in rows pandas

What's the best way to get all cell values based on a combination of column values? 基于列值的组合来获取所有单元格值的最佳方法是什么?

Sample dataframe One: 示例数据帧一:

  Stock                         Name  Price
0    AMD       Advanced Micro Devices    100
1     GE     General Electric Company    200
2    BAC  Bank of America Corporation    300
3   AAPL                   Apple Inc.    500
4   MSFT        Microsoft Corporation   1000
5  GOOGL                Alphabet Inc.   2000

Sample dataframe Two: 示例数据框二:

  Stock                         Name  Price
0    AMD       Advanced Micro Devices    100
1     GE     General Electric Company    200
2    BAC  Branch of America Corporation  300
3   AAPL                   Apple Inc.    500
4   MSFT        Microsoft Corporation   1000
5  GOOGL                Alphabet Inc.   2000

For example: I want to use (Stock and Name) as key columns and then compare the datasets. 例如:我想使用(库存和名称)作为关键列,然后比较数据集。 The goal is to print the mismatch entries between the two datasets with the Stock+Name columns used as a combination key. 目的是使用Stock + Name列作为组合键来打印两个数据集之间的不匹配项。

I'm using Pandas/Python3.7 我正在使用Pandas / Python3.7

Sample Output: 样本输出:

BAC Bank of America Corporation 300 --- BAC Branch of America Corporation 300 BAC美国银行公司300 ---美国公司300 BAC分行

Perhaps, a FULL INNER JOIN using merge + query ? 也许,使用merge + query的FULL INNER JOIN?

df1.merge(df2, on='Stock').query('Name_x != Name_y')

  Stock                       Name_x  Price_x                         Name_y  Price_y
2   BAC  Bank of America Corporation      300  Branch of America Corporation      300

Or, a slightly different solution with map , you can use to get the stock symbols: 或者,与map稍有不同的解决方案,您可以用来获取股票代码:

m = df1.Stock.map(df2.set_index('Stock').Name).ne(df1.Name)
symbols = df1.loc[m, 'Stock']

print(symbols)
2    BAC
Name: Stock, dtype: object

And then access each DataFrame row by stock symbol: 然后按库存代码访问每个DataFrame行:

df1[df1.Stock.isin(symbols)]
  Stock                         Name  Price
2   BAC  Bank of America Corporation    300

df2[df2.Stock.isin(symbols)]
  Stock                           Name  Price
2   BAC  Branch of America Corporation    300

If they are in two dataframes, merging them without condition is pretty straightforward with .concat . 如果它们在两个数据帧中,则使用.concat合并非常简单。 Once they are joined, here's one way to get the mismatch: 一旦加入,这是解决不匹配的一种方法:

import pandas as pd

df1 = pd.DataFrame({
    "Ticker_y": list("qwerty"),
    "Name_y": list("asdfgh"),
    "Ticker_x": list("qw3r7y"),
    "Name_x": list("as6f8h")
})

mismatch = df1[(df1["Ticker_y"] != df1["Ticker_x"]) & (df1["Name_y"] != df1["Name_x"])]

The last line just says "the df only where these conditions are met." 最后一行只是说“只有在满足这些条件的情况下,df”。

We can use isin using the sequence of values to test as it ensures each element in the DataFrame is contained in values 我们可以使用isin使用值序列进行测试,因为它可以确保DataFrame中的每个元素都包含在值中

First DataFrame 第一个数据框

>>> df1
   Stock                         Name  Price
0    AMD       Advanced Micro Devices    100
1     GE     General Electric Company    200
2    BAC  Bank of America Corporation    300
3   APPL                   Apple Inc.    500
4   MSFT        Microsoft Corporation   1000
5  GOOGL                Alphabet Inc.   2000

Second DataFrame 第二个DataFrame

>>> df2
   Stock                           Name  Price
0    AMD         Advanced Micro Devices    100
1     GE       General Electric Company    200
2    BAC  Branch of America Corporation    300
3   APPL                     Apple Inc.    500
4   MSFT          Microsoft Corporation   1000
5  GOOGL                  Alphabet Inc.   2000

Here you can go.. 在这里你可以去..

>>> df2[~df2.Name.isin(df1.Name.values)]
  Stock                           Name  Price
2   BAC  Branch of America Corporation    300

OR 要么

>>> df1[~df1.Name.isin(df2.Name.values)]
  Stock                         Name  Price
2   BAC  Bank of America Corporation    300

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM