简体   繁体   English

熊猫Python更新数据框A的值(如果在数据框B中找到)

[英]Pandas Python updating a value of a dataframe A if it is found in Dataframe B

I have two dataframes, Users and Devices. 我有两个数据框,用户和设备。 The users dataframe is a list of all interactions by user_id & timestamp. 用户数据框是按user_id和时间戳列出的所有交互的列表。 However, if someone uses our app as a guest, their user_id gets set to a device_id. 但是,如果有人使用我们的应用程序作为访客,则他们的user_id将设置为device_id。 If these guests eventually become members, we map their user_id to their device_id in the devices dataframe. 如果这些来宾最终成为成员,我们将在设备数据框中将其user_id映射到其device_id。

So we have for Users 所以我们有用户

user_id                                 timestamp
user13123                               2019-02-17
user224234                              2019-02-17
user32134234                            2019-02-17
00029AD9-X5X5-999N-807F-73F0EAE4A98B    2019-02-17

Where the final row is a guest user, with device id stored as user_id 最后一行是访客用户,设备ID存储为user_id

Then for Devices 然后用于设备

device_id                               user_id
00029AD9-X5X5-999N-807F-73F0EAE4A98B    user3423
37029BD9-D5D5-435D-837F-73F0EAE4A98B    user34423
...

Which is a simple mapping between device_ids and known user_ids 这是device_id和已知的user_id之间的简单映射

So what I want to do is check if Users.user_id matches to a Devices.device_id, and if so set Users.user_id to Devices.user_id. 因此,我要检查的是Users.user_id是否与Devices.device_id匹配,如果是,则将Users.user_id设置为Devices.user_id。 Basically, I want to update any old guest interactions to use the user_id if we have this information in Devices. 基本上,如果我们在设备中拥有此信息,我想更新任何旧的来宾交互以使用user_id。

Messed around with it for a while and it was getting more and more convoluted, and feels like something that could be solved pretty cleanly in pandas. 纠缠了一段时间,它变得越来越混乱,感觉像是可以在大熊猫中很干净地解决的东西。 Any help is much appreciated. 任何帮助深表感谢。

Thanks! 谢谢!

Dataframes Dataframes

In [32]: users
Out[32]:
                                user_id   timestamp
0                             user13123  2019-02-17
1                            user224234  2019-02-17
2                          user32134234  2019-02-17
3  00029AD9-D5D5-435D-807F-73F0EAE4A98B  2019-02-17

In []: devices
Out[]:
                              device_id    user_id
0  00029AD9-D5D5-435D-807F-73F0EAE4A98B   user3423
1  37029BD9-D5D5-435D-837F-73F0EAE4A98B  user34423

Compute a filter 计算过滤器

All users for which user_id matches a device_id user_iddevice_id匹配的所有用户

In []: filtr = users.user_id.isin(devices.device_id)

In []: filtr
Out[]:
0    False
1    False
2    False
3     True
Name: user_id, dtype: bool

Substitute values 替代值

All filtered users' user_id are replaced with matching device's user_id right in dataframe users . 在数据帧users所有已过滤用户的user_iduser_id匹配设备的user_id替换。

In []: users.loc[filtr, "user_id"] = users[filtr].user_id.map(devices.set_index("device_id").user_id)

In []: users
Out[]:
        user_id   timestamp
0     user13123  2019-02-17
1    user224234  2019-02-17
2  user32134234  2019-02-17
3      user3423  2019-02-17

Using np.where 使用np.where

Just another variation. 只是另一种变化。

users.loc[:, 'user_id'] = pd.np.where(users.user_id.isin(devices.device_id),
                                      users.user_id.map(devices.set_index('device_id').user_id),
                                      users.user_id)

These solutions expect that only one user_id exists for each device_id . 这些解决方案期望每个device_id仅存在一个user_id

left merge users with devices , fillna on the left-joined column user_id (it is name as user_id_y ). 左合并usersdevicesfillna在左连接的列user_id (它的名称为user_id_y )上。 Finally, assign this back to column users.user_id 最后,将其分配回users.user_id

In [59]: users['user_id'] = users.merge(devices, how='left', left_on='user_id', right_on='device_id')['user_id_y'].fillna(users.user_id)

In [60]: users
Out[60]:
    timestamp       user_id
0  2019-02-17     user13123
1  2019-02-17    user224234
2  2019-02-17  user32134234
3  2019-02-17      user3423

This is a loop that will run though all the entries checking of a user_id matched a device_id and if so it will update the Users dataframe with the correct id. 这是一个循环,尽管对user_id的所有条目检查都与device_id匹配,但如果这样,它将使用正确的ID更新Users数据帧。

for i in range(len(Users.index)):
    for p in range(len(Devices.index)):
        if(Users.loc[i,"user_id"] == Devices.loc[p,"device_id"]):
             # Fixed part of the code, check old version.
             Users.loc[i,"user_id"] = Devices.loc[p,"user_id"]

This solution finds the list of matching IDs then loops through once and updates the user_ids using it as the index. 此解决方案找到匹配ID的列表,然后循环浏览一次,并使用它作为索引来更新user_id。

devices = pd.DataFrame({'device_id':{0:'00029AD9-X5X5-999N-807F-73F0EAE4A98B',1:'37029BD9-D5D5-435D-837F-73F0EAE4A98B'},'user_id':{0:'user3423',1:'user34423'}})
users = pd.DataFrame({'user_id':{0:'user13123',1:'user224234',2:'user32134234',3:'00029AD9-X5X5-999N-807F-73F0EAE4A98B'},'timestamp':{0:'2019-02-17',1:'2019-02-17',2:'2019-02-17',3:'2019-02-17'}})

matching_ids = list(set(users.user_id).intersection(set(devices.device_id)))
for id in matching_ids:
    users.loc[users.user_id == id, 'user_id'] = devices.set_index('device_id').at[id, 'user_id']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM