繁体   English   中英

使用 Pandas 的左连接为同一行填充数据两次

[英]a left join using pandas is populating the data twice for the same row

我正在尝试使用pandas进行数据分析,使用merge进行vlookup,两个数据集如下,

data1 =
acc_name    tier   content   group  gcode     acc_ID
abc          3       55        b     111      R-DDD
def          4       45        c     222      X-TTT
xyz          4       60        a     333      S-UUU
abc          4       4         b     112      R-DDD
xyz          4       6         a     331      X-TTT
def          4       10        c     221      S-UUU
data2=
Accountn   type        status
xyz        internal     Active         
def        external     Active
abc        internal     Inactive

我使用的代码是

data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')
data1.rename(columns={'acc_name':'Accountn'},inplace = True)
final = pd.merge(data1,data2[['Accountn','status']],on=['Accountn'],how='left')

我得到的输出是:

final =
Accountn    tier   content   group  gcode     acc_ID    status
abc          3       55        b     111      R-DDD     Inactive
abc          3       55        b     111      R-DDD     Inactive
abc          3       55        b     111      R-DDD     Inactive
def          4       45        c     222      X-TTT     Active
def          4       45        c     222      X-TTT     Active
def          4       45        c     222      X-TTT     Active
xyz          4       60        a     333      S-UUU     Active
xyz          4       60        a     333      S-UUU     Active
xyz          4       60        a     333      S-UUU     Active
abc          4       4         b     112      R-DDD     Inactive
abc          4       4         b     112      R-DDD     Inactive
abc          4       4         b     112      R-DDD     Inactive
xyz          4       6         a     331      X-TTT     Active
xyz          4       6         a     331      X-TTT     Active
xyz          4       6         a     331      X-TTT     Active
def          4       10        c     221      S-UUU     Active
def          4       10        c     221      S-UUU     Active
def          4       10        c     221      S-UUU     Active

O/PI 想要的是,

Accountn    tier   content   group  gcode     acc_ID    status
abc          3       55        b     111      R-DDD     Inactive
def          4       45        c     222      X-TTT     Active
xyz          4       60        a     333      S-UUU     Active
abc          4       4         b     112      R-DDD     Inactive
xyz          4       6         a     331      X-TTT     Active
def          4       10        c     221      S-UUU     Active

我不知道我的代码有什么问题

按列删除df2的重复项以进行连接,此处为Accountn ,因此输出中没有重复项:

final = pd.merge(data1,data2[['Accountn','status']].drop_duplicates('Accountn'),
                 on=['Accountn'],
                 how='left')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM