Consider the following example:
I have a dataset of Movielens-
u.item.csv
ID|MOVIE NAME (YEAR)|REL.DATE|NULL|IMDB LINK|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|
1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
Seperator used here is Pipe, which is still manageable.
training_data.csv
,user_id,movie_id,rating,unix_timestamp
0,1,1,5,874965758
1,1,2,3,876893171
2,1,3,4,878542960
Since I need to show the Movie names in "Training_data", instead of "movie id" I need to match every ID of u.item.csv with movie_id with training_data.csv and then replace it.
I'm using Python Pandas, and The training data was converted from Sframe to Dataframe to CSV. So that I could acquire the required change, which is yet unsuccessful. I can surely use some looping structure, but matching and replacing is real challenge I face.
Ps I know Training data will be in sequence per user and will produce the exact output if replaced, but I need to learn this so that when I recommend movies I need MOVIE Names to displayed and not IDs.
I've already tried
I think you need map
by Series
created by set_index
:
print (df1.set_index('ID')['MOVIE NAME (YEAR)'])
ID
1 Toy Story (1995)
2 GoldenEye (1995)
3 Four Rooms (1995)
Name: MOVIE NAME (YEAR), dtype: object
df2['movie_id'] = df2['movie_id'].map(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
user_id movie_id rating unix_timestamp
0 1 Toy Story (1995) 5 874965758
1 1 GoldenEye (1995) 3 876893171
2 1 Four Rooms (1995) 4 878542960
Or use replace
:
df2['movie_id'] = df2['movie_id'].replace(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
user_id movie_id rating unix_timestamp
0 1 Toy Story (1995) 5 874965758
1 1 GoldenEye (1995) 3 876893171
2 1 Four Rooms (1995) 4 878542960
Difference is if not match, map
create NaN
and replace let original value:
print (df2)
user_id movie_id rating unix_timestamp
0 1 1 5 874965758
1 1 2 3 876893171
2 1 5 4 878542960 <- 5 not match
df2['movie_id'] = df2['movie_id'].map(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
user_id movie_id rating unix_timestamp
0 1 Toy Story (1995) 5 874965758
1 1 GoldenEye (1995) 3 876893171
2 1 NaN 4 878542960
df2['movie_id'] = df2['movie_id'].replace(df1.set_index('ID')['MOVIE NAME (YEAR)'])
print (df2)
user_id movie_id rating unix_timestamp
0 1 Toy Story (1995) 5 874965758
1 1 GoldenEye (1995) 3 876893171
2 1 5 4 878542960
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.