[英]Adding missing rows from the another table based on 2 columns
I have a subset of a dataframe like bellow 我有一个像下面这样的数据框的子集
ID var1 var2 var3
111 A 1 1
222 A 1 1
333 A 1 1
444 A 2 1
555 A 2 1
666 A 2 1
and I want to join missing information from dataframe bellow. 我想加入下面的数据框缺少的信息。 But only those ID that subset contains var1 and var2
但是只有那些子集包含var1和var2的ID
ID var1 var2 var3
111 A 1 1
222 A 1 1
333 A 1 1
777 A 1 0
888 A 1 0
444 A 2 1
555 A 2 1
666 A 2 1
999 A 2 0
123 B 3 1
456 B 4 0
789 C 5 1
So output should be 所以输出应该是
ID var1 var2 var3
111 A 1 1
222 A 1 1
333 A 1 1
777 A 1 0
888 A 1 0
444 A 2 1
555 A 2 1
666 A 2 1
999 A 2 0
Thanks! 谢谢!
Use merge
使用
merge
In [164]: df2.merge(df1[['var1', 'var2']].drop_duplicates())
Out[164]:
ID var1 var2 var3
0 111 A 1 1
1 222 A 1 1
2 333 A 1 1
3 777 A 1 0
4 888 A 1 0
5 444 A 2 1
6 555 A 2 1
7 666 A 2 1
8 999 A 2 0
Although Zero already answered. 虽然零号已经回答。 You could also use
Pandas Library
and it's DataFrame
module. 您也可以使用
Pandas Library
及其DataFrame
模块。 It's very easy to use and understand; 它非常易于使用和理解。 using just indexes you can sort, iterate, aggregate, concactenate, visualize, and sparse data.
仅使用索引,您就可以对数据进行排序,迭代,聚合,压缩,可视化和稀疏。
Combining it with NumPy
's ndarray
makes it even easier to manipulate. 将其与
NumPy
的ndarray
结合使用,使其更易于操作。 TutorialsPoint has a great tutorial on how to combine the two (basic functionality) Python Pandas - Basic Functionality . TutorialsPoint提供了一个很棒的教程,介绍了如何结合这两个(基本功能) Python Pandas-Basic Functionality 。
'''
ID var1 var2 var3
111 A 1 1
222 A 1 1
333 A 1 1
444 A 2 1
555 A 2 1
666 A 2 1
'''
import pandas as pd
data = [
[111, 'A', 1, 1],
[222, 'A', 1, 1],
[333, 'A', 1, 1],
[444, 'A', 2, 1],
[555, 'A', 2, 1],
[666, 'A', 2, 1]
]
df = pd.DataFrame( data, columns = [ 'ID', 'var1', 'var2', 'var3' ] )
print(df)
ID var1 var2 var3
0 111 A 1 1
1 222 A 1 1
2 333 A 1 1
3 444 A 2 1
4 555 A 2 1
5 666 A 2 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.