比较 Pandas 中两个大小不等的 Dataframes 中的列以进行条件检查

Question

I have two pandas DF.我有两只熊猫 DF。 Of unequal sizes.大小不等。 For example :例如：

Df1
id     value
a      2
b      3
c      22
d      5 

Df2 
id     value
c      22
a      2

No I want to extract from DF1 those rows which has the same id as in DF2.不，我想从 DF1 中提取与 DF2 具有相同 id 的那些行。 Now my first approach is to run 2 for loops, with something like :现在我的第一种方法是运行 2 个 for 循环，类似于：

x=[]
for i in range(len(DF2)):
    for j in range(len(DF1)):
        if DF2['id'][i] == DF1['id'][j]:
          x.append(DF1.iloc[j])

Now this is okay, but for 2 files of 400,000 lines in one and 5,000 in another, I need an efficient Pythonic+Pnadas way现在这没问题，但是对于 2 个 400,000 行的文件和 5,000 行的另一个文件，我需要一种高效的 Pythonic+Pnadas 方式

Answer 1

You can concat the dataframes , then check if all the elements are duplicated or not , then drop_duplicates and keep just the first occurrence:您可以连接数据帧，然后检查所有元素是否duplicated ，然后drop_duplicates并仅保留第一次出现：

m = pd.concat((df1,df2))
m[m.duplicated('id',keep=False)].drop_duplicates()

  id  value
0  a      2
2  c     22

Answer 2

你可以试试这个：

df = df1[df1.set_index(['id']).index.isin(df2.set_index(['id']).index)]

Answer 3

import pandas as pd

data1={'id':['a','b','c','d'],
       'value':[2,3,22,5]}

data2={'id':['c','a'],
       'value':[22,2]}

df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
finaldf=pd.concat([df1,df2],ignore_index=True)

Output after concat连接后输出

Final Ouput最终输出

finaldf.drop_duplicates()

    id  value
0   a   2
1   b   3
2   c   22
3   d   5

比较 Pandas 中两个大小不等的 Dataframes 中的列以进行条件检查

问题描述

3 个解决方案

解决方案1
1 2020-02-11 10:56:12

解决方案2
1 2020-02-11 11:08:52

解决方案3
1 已采纳 2020-02-11 12:30:14

比较 Pandas 中两个大小不等的 Dataframes 中的列以进行条件检查

问题描述

3 个解决方案

解决方案1 1 2020-02-11 10:56:12

解决方案2 1 2020-02-11 11:08:52

解决方案3 1 已采纳 2020-02-11 12:30:14

解决方案1
1 2020-02-11 10:56:12

解决方案2
1 2020-02-11 11:08:52

解决方案3
1 已采纳 2020-02-11 12:30:14