简体   繁体   English

根据2列在另一个表中添加缺失的行

[英]Adding missing rows from the another table based on 2 columns

I have a subset of a dataframe like bellow 我有一个像下面这样的数据框的子集

ID  var1 var2 var3
111  A    1    1
222  A    1    1
333  A    1    1
444  A    2    1
555  A    2    1
666  A    2    1

and I want to join missing information from dataframe bellow. 我想加入下面的数据框缺少的信息。 But only those ID that subset contains var1 and var2 但是只有那些子集包含var1和var2的ID

ID  var1 var2 var3
111  A    1    1
222  A    1    1
333  A    1    1
777  A    1    0
888  A    1    0
444  A    2    1
555  A    2    1
666  A    2    1
999  A    2    0
123  B    3    1
456  B    4    0
789  C    5    1

So output should be 所以输出应该是

ID  var1 var2 var3
111  A    1    1
222  A    1    1
333  A    1    1
777  A    1    0
888  A    1    0
444  A    2    1
555  A    2    1
666  A    2    1
999  A    2    0

Thanks! 谢谢!

Use merge 使用merge

In [164]: df2.merge(df1[['var1', 'var2']].drop_duplicates())
Out[164]:
    ID var1  var2  var3
0  111    A     1     1
1  222    A     1     1
2  333    A     1     1
3  777    A     1     0
4  888    A     1     0
5  444    A     2     1
6  555    A     2     1
7  666    A     2     1
8  999    A     2     0

Although Zero already answered. 虽然零号已经回答。 You could also use Pandas Library and it's DataFrame module. 您也可以使用Pandas Library及其DataFrame模块。 It's very easy to use and understand; 它非常易于使用和理解。 using just indexes you can sort, iterate, aggregate, concactenate, visualize, and sparse data. 仅使用索引,您就可以对数据进行排序,迭代,聚合,压缩,可视化和稀疏。

Combining it with NumPy 's ndarray makes it even easier to manipulate. 将其与NumPyndarray结合使用,使其更易于操作。 TutorialsPoint has a great tutorial on how to combine the two (basic functionality) Python Pandas - Basic Functionality . TutorialsPoint提供了一个很棒的教程,介绍了如何结合这两个(基本功能) Python Pandas-Basic Functionality

Example

'''
ID  var1 var2 var3
111  A    1    1
222  A    1    1
333  A    1    1
444  A    2    1
555  A    2    1
666  A    2    1
'''
import pandas as pd

data = [
        [111, 'A', 1, 1],
        [222, 'A', 1, 1],
        [333, 'A', 1, 1],
        [444, 'A', 2, 1],
        [555, 'A', 2, 1],
        [666, 'A', 2, 1]
       ]

df = pd.DataFrame( data, columns = [ 'ID', 'var1', 'var2', 'var3' ] )

print(df)

Output 输出量

    ID var1  var2  var3
0  111    A     1     1
1  222    A     1     1
2  333    A     1     1
3  444    A     2     1
4  555    A     2     1
5  666    A     2     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM