简体   繁体   English

如何用单独的字典值替换数据框列-python

[英]How to replace dataframe column with separate dict values - python

My user_artist_plays dataframe below shows a user column, but for statistical computation I must replace these mixed characters with int only IDs. 我的user_artist_plays数据user_artist_plays下面显示了一个用户列,但是为了进行统计计算,我必须将这些混合字符替换为仅int ID。

    users                                       artist  plays
0   00001411dc427966b17297bf4d69e7e193135d89    sting   12763
1   00001411dc427966b17297bf4d69e7e193135d89    stars   8192
2   fffe8c7f952d9b960a56ed4dcb40a415d924b224    cher    117
3   fffe8c7f952d9b960a56ed4dcb40a415d924b224    queen   117

The above shows multiple entries for only two users, which is ok if I can have the column match any entry with an existing key in the separate dictionary: 上面显示了仅两个用户的多个条目,如果我可以让该列与单独字典中具有现有键的任何条目进行匹配,则可以:

users = user_artist_plays['users'].unique()
user_dict = {ni: indi for indi, ni in enumerate(set(users))}
user_dict

{'068156fafd9c4237c174c648d3d484cbf509cb75': 0,
 '6deecfbc46a81e4faf398b2afd991be05ab78f10': 74205,
 '1e23333ff4f637420a8a38d467ccecfda064afb9': 1,
 '0b282cafc949efe4163b7946b7104957a18cf010': 2,
 'd1867cbda35e0d48e9a8390d9f5e079c9d99ea96': 3}

Here's my attempt at switching out for int values: 这是我尝试换出int值的尝试:

for k, v in user_dict.items():
        if user_artist_plays['users'].any(k):
            user_artist_plays['users'].replace(v)

It's retaining the original values of the users column... 它保留了users列的原始值...

It seems you need map : 看来您需要map

user_artist_plays['users'] = user_artist_plays['users'].map(user_dict)

Or factorize : factorize

user_artist_plays['users'] = pd.factorize(user_artist_plays['users'])[0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM