[英]Merge pandas doesn't work, it looks like concat
I've been working with two dataframes (info_clients and metadata_clients) both have an user_id and id_wp column as associated key , respectively, and I loeaded info_clients into sql table and get the PK associated, then merge these dfs on user_id (by the left side) and id_wp (on the right).我一直在使用两个数据帧(info_clients 和 metadata_clients),它们分别有一个user_id和id_wp列作为关联键,我将 info_clients 加载到 sql 表中并获取关联的 PK,然后在 user_id 上合并这些 dfs(在左侧) 和 id_wp (在右边)。
info_clients: (72232, 1) info_clients: (72232, 1)
user_id
0 0
1 1
2 4
3 5
4 39784
metadata_clients: (72232, 2) metadata_clients: (72232, 2)
id id_wp
0 1158426 0
1 1158427 1
2 1158428 4
3 1158429 5
4 1158430 39784
I used this:我用这个:
merge = pd.merge( info_clients, metadata_clients, left_on=['user_id'],
right_on=['id_wp'], how='left')
But it doesn't work as I expected, I had this result:但它并没有像我预期的那样工作,我得到了这个结果:
user_id cliente_fk id_wp
0 0 1158426 0
1 1 1158427 1
2 4 1158428 4
3 5 1158429 5
4 39784 1158430 39784
Datamerge shape: (126680, 3)
When I save the info_clients data into sql table, I verify these data and I have 72232 clients saved.当我将 info_clients 数据保存到 sql 表中时,我验证了这些数据并保存了72232 个客户端。 I don't have nulls or nan values, I cleaned the data and checked its dtypes, both keys are int64.
我没有空值或 nan 值,我清理了数据并检查了它的 dtypes,两个键都是 int64。
You have a situation where you have duplicates:您有重复的情况:
No, I don't have duplicates, I removed in a previoust step, using:
不,我没有重复,我在之前的步骤中删除了,使用:
data.drop_duplicates(keep='first')
I don't know if data
is your first ( info_clients
) or your second ( metadata_clients
) but if you drop duplicates without set a subset of columns, it's likely you have no duplicate on entire row.我不知道
data
是您的第一个( info_clients
)还是您的第二个( metadata_clients
),但是如果您删除重复项而不设置列的子集,则很可能整行都没有重复项。 You should try:你应该试试:
data = data.drop_duplicates('user_id', keep='first')
# OR
data = data.drop_duplicates('wp_id', keep='first')
You should try to debug with value_counts
:您应该尝试使用
value_counts
进行调试:
data.value_counts('user_id')
# OR
data.value_counts('wp_id')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.