[英]a left join using pandas is populating the data twice for the same row
我正在尝试使用pandas进行数据分析,使用merge
进行vlookup,两个数据集如下,
data1 =
acc_name tier content group gcode acc_ID
abc 3 55 b 111 R-DDD
def 4 45 c 222 X-TTT
xyz 4 60 a 333 S-UUU
abc 4 4 b 112 R-DDD
xyz 4 6 a 331 X-TTT
def 4 10 c 221 S-UUU
data2=
Accountn type status
xyz internal Active
def external Active
abc internal Inactive
我使用的代码是
data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')
data1.rename(columns={'acc_name':'Accountn'},inplace = True)
final = pd.merge(data1,data2[['Accountn','status']],on=['Accountn'],how='left')
我得到的输出是:
final =
Accountn tier content group gcode acc_ID status
abc 3 55 b 111 R-DDD Inactive
abc 3 55 b 111 R-DDD Inactive
abc 3 55 b 111 R-DDD Inactive
def 4 45 c 222 X-TTT Active
def 4 45 c 222 X-TTT Active
def 4 45 c 222 X-TTT Active
xyz 4 60 a 333 S-UUU Active
xyz 4 60 a 333 S-UUU Active
xyz 4 60 a 333 S-UUU Active
abc 4 4 b 112 R-DDD Inactive
abc 4 4 b 112 R-DDD Inactive
abc 4 4 b 112 R-DDD Inactive
xyz 4 6 a 331 X-TTT Active
xyz 4 6 a 331 X-TTT Active
xyz 4 6 a 331 X-TTT Active
def 4 10 c 221 S-UUU Active
def 4 10 c 221 S-UUU Active
def 4 10 c 221 S-UUU Active
O/PI 想要的是,
Accountn tier content group gcode acc_ID status
abc 3 55 b 111 R-DDD Inactive
def 4 45 c 222 X-TTT Active
xyz 4 60 a 333 S-UUU Active
abc 4 4 b 112 R-DDD Inactive
xyz 4 6 a 331 X-TTT Active
def 4 10 c 221 S-UUU Active
我不知道我的代码有什么问题
按列删除df2
的重复项以进行连接,此处为Accountn
,因此输出中没有重复项:
final = pd.merge(data1,data2[['Accountn','status']].drop_duplicates('Accountn'),
on=['Accountn'],
how='left')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.