[英]Pandas Python updating a value of a dataframe A if it is found in Dataframe B
I have two dataframes, Users and Devices. 我有两个数据框,用户和设备。 The users dataframe is a list of all interactions by user_id & timestamp. 用户数据框是按user_id和时间戳列出的所有交互的列表。 However, if someone uses our app as a guest, their user_id gets set to a device_id. 但是,如果有人使用我们的应用程序作为访客,则他们的user_id将设置为device_id。 If these guests eventually become members, we map their user_id to their device_id in the devices dataframe. 如果这些来宾最终成为成员,我们将在设备数据框中将其user_id映射到其device_id。
So we have for Users 所以我们有用户
user_id timestamp
user13123 2019-02-17
user224234 2019-02-17
user32134234 2019-02-17
00029AD9-X5X5-999N-807F-73F0EAE4A98B 2019-02-17
Where the final row is a guest user, with device id stored as user_id 最后一行是访客用户,设备ID存储为user_id
Then for Devices 然后用于设备
device_id user_id
00029AD9-X5X5-999N-807F-73F0EAE4A98B user3423
37029BD9-D5D5-435D-837F-73F0EAE4A98B user34423
...
Which is a simple mapping between device_ids and known user_ids 这是device_id和已知的user_id之间的简单映射
So what I want to do is check if Users.user_id matches to a Devices.device_id, and if so set Users.user_id to Devices.user_id. 因此,我要检查的是Users.user_id是否与Devices.device_id匹配,如果是,则将Users.user_id设置为Devices.user_id。 Basically, I want to update any old guest interactions to use the user_id if we have this information in Devices. 基本上,如果我们在设备中拥有此信息,我想更新任何旧的来宾交互以使用user_id。
Messed around with it for a while and it was getting more and more convoluted, and feels like something that could be solved pretty cleanly in pandas. 纠缠了一段时间,它变得越来越混乱,感觉像是可以在大熊猫中很干净地解决的东西。 Any help is much appreciated. 任何帮助深表感谢。
Thanks! 谢谢!
In [32]: users
Out[32]:
user_id timestamp
0 user13123 2019-02-17
1 user224234 2019-02-17
2 user32134234 2019-02-17
3 00029AD9-D5D5-435D-807F-73F0EAE4A98B 2019-02-17
In []: devices
Out[]:
device_id user_id
0 00029AD9-D5D5-435D-807F-73F0EAE4A98B user3423
1 37029BD9-D5D5-435D-837F-73F0EAE4A98B user34423
All users for which user_id
matches a device_id
user_id
与device_id
匹配的所有用户
In []: filtr = users.user_id.isin(devices.device_id)
In []: filtr
Out[]:
0 False
1 False
2 False
3 True
Name: user_id, dtype: bool
All filtered users' user_id
are replaced with matching device's user_id
right in dataframe users
. 在数据帧users
所有已过滤用户的user_id
都user_id
匹配设备的user_id
替换。
In []: users.loc[filtr, "user_id"] = users[filtr].user_id.map(devices.set_index("device_id").user_id)
In []: users
Out[]:
user_id timestamp
0 user13123 2019-02-17
1 user224234 2019-02-17
2 user32134234 2019-02-17
3 user3423 2019-02-17
np.where
使用np.where
Just another variation. 只是另一种变化。
users.loc[:, 'user_id'] = pd.np.where(users.user_id.isin(devices.device_id),
users.user_id.map(devices.set_index('device_id').user_id),
users.user_id)
These solutions expect that only one user_id
exists for each device_id
. 这些解决方案期望每个device_id
仅存在一个user_id
。
left merge users
with devices
, fillna
on the left-joined column user_id
(it is name as user_id_y
). 左合并users
与devices
, fillna
在左连接的列user_id
(它的名称为user_id_y
)上。 Finally, assign this back to column users.user_id
最后,将其分配回users.user_id
列
In [59]: users['user_id'] = users.merge(devices, how='left', left_on='user_id', right_on='device_id')['user_id_y'].fillna(users.user_id)
In [60]: users
Out[60]:
timestamp user_id
0 2019-02-17 user13123
1 2019-02-17 user224234
2 2019-02-17 user32134234
3 2019-02-17 user3423
This is a loop that will run though all the entries checking of a user_id matched a device_id and if so it will update the Users dataframe with the correct id. 这是一个循环,尽管对user_id的所有条目检查都与device_id匹配,但如果这样,它将使用正确的ID更新Users数据帧。
for i in range(len(Users.index)):
for p in range(len(Devices.index)):
if(Users.loc[i,"user_id"] == Devices.loc[p,"device_id"]):
# Fixed part of the code, check old version.
Users.loc[i,"user_id"] = Devices.loc[p,"user_id"]
This solution finds the list of matching IDs then loops through once and updates the user_ids using it as the index. 此解决方案找到匹配ID的列表,然后循环浏览一次,并使用它作为索引来更新user_id。
devices = pd.DataFrame({'device_id':{0:'00029AD9-X5X5-999N-807F-73F0EAE4A98B',1:'37029BD9-D5D5-435D-837F-73F0EAE4A98B'},'user_id':{0:'user3423',1:'user34423'}})
users = pd.DataFrame({'user_id':{0:'user13123',1:'user224234',2:'user32134234',3:'00029AD9-X5X5-999N-807F-73F0EAE4A98B'},'timestamp':{0:'2019-02-17',1:'2019-02-17',2:'2019-02-17',3:'2019-02-17'}})
matching_ids = list(set(users.user_id).intersection(set(devices.device_id)))
for id in matching_ids:
users.loc[users.user_id == id, 'user_id'] = devices.set_index('device_id').at[id, 'user_id']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.