I have two dataframes, Users and Devices. The users dataframe is a list of all interactions by user_id & timestamp. However, if someone uses our app as a guest, their user_id gets set to a device_id. If these guests eventually become members, we map their user_id to their device_id in the devices dataframe.
So we have for Users
user_id timestamp
user13123 2019-02-17
user224234 2019-02-17
user32134234 2019-02-17
00029AD9-X5X5-999N-807F-73F0EAE4A98B 2019-02-17
Where the final row is a guest user, with device id stored as user_id
Then for Devices
device_id user_id
00029AD9-X5X5-999N-807F-73F0EAE4A98B user3423
37029BD9-D5D5-435D-837F-73F0EAE4A98B user34423
...
Which is a simple mapping between device_ids and known user_ids
So what I want to do is check if Users.user_id matches to a Devices.device_id, and if so set Users.user_id to Devices.user_id. Basically, I want to update any old guest interactions to use the user_id if we have this information in Devices.
Messed around with it for a while and it was getting more and more convoluted, and feels like something that could be solved pretty cleanly in pandas. Any help is much appreciated.
Thanks!
In [32]: users
Out[32]:
user_id timestamp
0 user13123 2019-02-17
1 user224234 2019-02-17
2 user32134234 2019-02-17
3 00029AD9-D5D5-435D-807F-73F0EAE4A98B 2019-02-17
In []: devices
Out[]:
device_id user_id
0 00029AD9-D5D5-435D-807F-73F0EAE4A98B user3423
1 37029BD9-D5D5-435D-837F-73F0EAE4A98B user34423
All users for which user_id
matches a device_id
In []: filtr = users.user_id.isin(devices.device_id)
In []: filtr
Out[]:
0 False
1 False
2 False
3 True
Name: user_id, dtype: bool
All filtered users' user_id
are replaced with matching device's user_id
right in dataframe users
.
In []: users.loc[filtr, "user_id"] = users[filtr].user_id.map(devices.set_index("device_id").user_id)
In []: users
Out[]:
user_id timestamp
0 user13123 2019-02-17
1 user224234 2019-02-17
2 user32134234 2019-02-17
3 user3423 2019-02-17
np.where
Just another variation.
users.loc[:, 'user_id'] = pd.np.where(users.user_id.isin(devices.device_id),
users.user_id.map(devices.set_index('device_id').user_id),
users.user_id)
These solutions expect that only one user_id
exists for each device_id
.
left merge users
with devices
, fillna
on the left-joined column user_id
(it is name as user_id_y
). Finally, assign this back to column users.user_id
In [59]: users['user_id'] = users.merge(devices, how='left', left_on='user_id', right_on='device_id')['user_id_y'].fillna(users.user_id)
In [60]: users
Out[60]:
timestamp user_id
0 2019-02-17 user13123
1 2019-02-17 user224234
2 2019-02-17 user32134234
3 2019-02-17 user3423
This is a loop that will run though all the entries checking of a user_id matched a device_id and if so it will update the Users dataframe with the correct id.
for i in range(len(Users.index)):
for p in range(len(Devices.index)):
if(Users.loc[i,"user_id"] == Devices.loc[p,"device_id"]):
# Fixed part of the code, check old version.
Users.loc[i,"user_id"] = Devices.loc[p,"user_id"]
This solution finds the list of matching IDs then loops through once and updates the user_ids using it as the index.
devices = pd.DataFrame({'device_id':{0:'00029AD9-X5X5-999N-807F-73F0EAE4A98B',1:'37029BD9-D5D5-435D-837F-73F0EAE4A98B'},'user_id':{0:'user3423',1:'user34423'}})
users = pd.DataFrame({'user_id':{0:'user13123',1:'user224234',2:'user32134234',3:'00029AD9-X5X5-999N-807F-73F0EAE4A98B'},'timestamp':{0:'2019-02-17',1:'2019-02-17',2:'2019-02-17',3:'2019-02-17'}})
matching_ids = list(set(users.user_id).intersection(set(devices.device_id)))
for id in matching_ids:
users.loc[users.user_id == id, 'user_id'] = devices.set_index('device_id').at[id, 'user_id']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.