简体   繁体   中英

How can I use get_dummies() in this case?

I need to classify userId X movieId and I have two columns: userId and movieId .

userId  movieId
60265   2123
60265   2291
60265   2329
60265   2355
60265   2389
60265   2396
60265   2402
60265   2403
60265   2421
19254   2389
19254   2396
19254   2402
19254   2403
19254   2421
19254   2123
19254   2291
19254   2329

Each userId has more than one movieId watched. I pretend use histogram to distribute all movie watched by each user.

userId/movieId  2123  2291  2329  2355  2389  2396  2402  2403  2421  2592  2596
   60265          1     1     1    1      1     1     1     1     1     0     0   
   19254          1     1     1    0      1     1     1     1     1     0     0

How can I use function get_dummies() to construct a similar table of userId X movieId?

You use pd.get_dummies like this:

(pd.get_dummies(df.set_index('userId'), columns=['movieId'], prefix='', prefix_sep='')
   .sum(level=0)
   .reset_index())

Output:

   userId  2123  2291  2329  2355  2389  2396  2402  2403  2421
0   60265     1     1     1     1     1     1     1     1     1
1   19254     1     1     1     0     1     1     1     1     1

You need to set index then use get_dummies, here is the full code

import pandas as pd
data = {"movie": [2123, 2126, 2123], "userId": [1, 1, 2]}

df = pd.DataFrame(data)
df.set_index('userId', inplace=True)
pd.concat([df,pd.get_dummies(df['movie'], prefix='movie')], axis=1).drop(['movie'], axis=1, inplace=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM