简体   繁体   中英

How to convert separated values into one-hot encoded columns?

So I have dataframe which looks like this which has food choices and user IDs as columns:

        Food choices
1   0   Pizza | Hamburger
2   1   Sushi | Pizza | Pasta | Steak | Noodles
3   2   
4   3   French Fries | Hot dogs | Prawns
5   4   Bacon | Meatballs
6   5   Mozeralla Sticks

I want to split them up as something like this:

User_ID, Pizza, Hamburger, Sushi, Pasta, ...
1, True, True, False, False, ...
2, True, False, True, True, ...

I have split them up as:

df['Food Choices'].fillna('None').apply(lambda x: pd.Series(x.split('|'))).fillna('None').replace('None',np.nan)

Now I do have them in separate columns but I a struggling with how to Mark presence/absence of a value. I was thinking along the lines of separating out each value and comparing each value with dataframe as:

lst = list(pd.unique(df['Food choices'].fillna('None').apply(lambda x: pd.Series(x.split('|'))).fillna('None').values.ravel('K')))
temp = df['Food choices'].fillna('None').apply(lambda x: pd.Series(x.split('|'))).fillna('None')
dfs = pd.DataFrame(columns = lst,
            index = temp.index)
for val in lst:
    for idx in temp.index:
        dfs.loc[idx, val] = (temp.loc[idx]  == val).any()
         

Way toooo ugly and way toooooo slow! So I was thinking maybe there is some function I maybe missing out which can help me in this regards. pd.get_dummies() does not help. Any suggestion how to better this situation will be highly helpful.

Try with str.get_dummies

s = df['Food choices'].str.replace(' \| ','|').str.strip().str.lower().str.get_dummies('|')
df = df.join(s)

Cause of the random order in which they occur sometimes space would be next to separator and sometimes won't will cause doubling of column names one with space and one without. This will help in any situation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM