合并 pandas 不起作用，看起来像 concat

Question

I've been working with two dataframes (info_clients and metadata_clients) both have an user_id and id_wp column as associated key , respectively, and I loeaded info_clients into sql table and get the PK associated, then merge these dfs on user_id (by the left side) and id_wp (on the right).我一直在使用两个数据帧（info_clients 和 metadata_clients），它们分别有一个user_id和id_wp列作为关联键，我将 info_clients 加载到 sql 表中并获取关联的 PK，然后在 user_id 上合并这些 dfs（在左侧) 和 id_wp (在右边)。

info_clients: (72232, 1) info_clients: (72232, 1)

metadata_clients: (72232, 2) metadata_clients: (72232, 2)

        id  id_wp
0  1158426      0
1  1158427      1
2  1158428      4
3  1158429      5
4  1158430  39784

I used this:我用这个：

merge = pd.merge( info_clients, metadata_clients, left_on=['user_id'], 
                            right_on=['id_wp'], how='left')

But it doesn't work as I expected, I had this result:但它并没有像我预期的那样工作，我得到了这个结果：

  user_id  cliente_fk  id_wp
0       0     1158426      0
1       1     1158427      1
2       4     1158428      4
3       5     1158429      5
4   39784     1158430  39784
Datamerge shape: (126680, 3)

When I save the info_clients data into sql table, I verify these data and I have 72232 clients saved.当我将 info_clients 数据保存到 sql 表中时，我验证了这些数据并保存了72232 个客户端。 I don't have nulls or nan values, I cleaned the data and checked its dtypes, both keys are int64.我没有空值或 nan 值，我清理了数据并检查了它的 dtypes，两个键都是 int64。

Answer 1

You have a situation where you have duplicates:您有重复的情况：

No, I don't have duplicates, I removed in a previoust step, using:不，我没有重复，我在之前的步骤中删除了，使用：
data.drop_duplicates(keep='first')

I don't know if data is your first ( info_clients ) or your second ( metadata_clients ) but if you drop duplicates without set a subset of columns, it's likely you have no duplicate on entire row.我不知道data是您的第一个（ info_clients ）还是您的第二个（ metadata_clients ），但是如果您删除重复项而不设置列的子集，则很可能整行都没有重复项。 You should try:你应该试试：

data = data.drop_duplicates('user_id', keep='first')

# OR

data = data.drop_duplicates('wp_id', keep='first')

You should try to debug with value_counts :您应该尝试使用value_counts进行调试：

data.value_counts('user_id')

# OR

data.value_counts('wp_id')

合并 pandas 不起作用，看起来像 concat

问题描述

1 个解决方案

解决方案1
0 2022-01-15 21:51:25

合并 pandas 不起作用，看起来像 concat

问题描述

1 个解决方案

解决方案1 0 2022-01-15 21:51:25

解决方案1
0 2022-01-15 21:51:25