I'm making a movie recommendation system. I need a python code which converts the data imported from an excel sheet to a set format (as shown below).
Code to import data from the excel sheet:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('project.xlsx', sheetname='Sheet1')
df.head(40)
Output I get:
USER MOVIE RATINGS
0 Julia Roberts Shrek 2.5
1 NaN V for Vendetta 3.5
2 NaN Pretty Woman 3.0
3 NaN Star Wars 3.5
4 NaN While You Were Sleeping 2.5
5 NaN Phone Booth 3.0
6 Drew Barrymore Shrek 3.0
7 NaN V for Vendetta 3.5
8 NaN Pretty Woman 1.5
9 NaN Star Wars 5.0
10 NaN Phone Booth 3.0
11 NaN While You Were Sleeping 3.5
12 Kate Winslet Shrek 2.5
13 NaN V for Vendetta 3.0
14 NaN Star Wars 3.5
15 NaN Phone Booth 4.0
16 Tom Hanks While You Were Sleeping 2.5
17 NaN V for Vendetta 3.5
18 NaN Pretty Woman 3.0
19 NaN Star Wars 4.0
20 NaN Phone Booth 4.5
....
......
......
......
From here I need to have an output like this:
dataset={
'Julia Roberts': {
'Shrek': 2.5,
'I am Legend':3.0,
'V for Vendetta': 3.5,
'Pretty Woman': 0,
"My Sister's Keeper":5.0,
'Star Wars': 3.5,
'Me Before You': 3.0,
'While You Were Sleeping': 2.5,
'Phone Booth': 3.0},
'Drew Barrymore': {'Shrek': 3.0,
'V for Vendetta': 3.5,
'Pretty Woman': 1.5,
"My Sister's Keeper":4.0,
'Star Wars': 5.0,
'Phone Booth': 3.0,
'While You Were Sleeping': 3.5},
'Tom Hanks': {'V for Vendetta': 3.5,
'Pretty Woman': 3.0,
'Phone Booth': 4.5,
'Star Wars': 4.0,
'While You Were Sleeping': 2.5,
'I am Legend':3.5},
'Sandra Bullock': {'Shrek': 3.0,
'V for Vendetta': 4.0,
'Pretty Woman': 2.0,
'Star Wars': 3.0,
'I am Legend':4.5,
"My Sister's Keeper":3.5,
'Phone Booth': 3.0,
'While You Were Sleeping': 2.0}
}
Code I am using (but showing error):
max_nb_row = 0
for sheet in df.sheets():
max_nb_row = max(max_nb_row, sheet.nrows)
for row in range(max_nb_row) :
for sheet in df.sheets() :
if row < sheet.nrows :
print (sheet.row(row))
You can use this incomprehensible one-liner:
df.ffill().groupby('user').apply(lambda x: dict(zip(x['movie'], x['ratings']))).to_dict()
To visualize what's happening, we'll use this smaller dataframe:
>>> df
user movie ratings
0 Julia Roberts Shrek 2.5
1 NaN V for Vendetta 3.5
2 NaN Pretty Woman 3.0
3 Drew Barrymore Shrek 3.0
4 NaN V for Vendetta 3.5
Step by step, this is what happens:
Use ffill
to replace the NaN
values in the user
column with the name above.
user movie ratings 0 Julia Roberts Shrek 2.5 1 Julia Roberts V for Vendetta 3.5 2 Julia Roberts Pretty Woman 3.0 3 Drew Barrymore Shrek 3.0 4 Drew Barrymore V for Vendetta 3.5
Use groupby('user')
to group the data by user
Use apply(lambda x: dict(zip(x['movie'], x['ratings']))
to create dicts of {movie: rating}
pairs.
user Drew Barrymore {'Shrek': 3.0, 'V for Vendetta': 3.5} Julia Roberts {'Shrek': 2.5, 'V for Vendetta': 3.5, 'Pretty ... dtype: object
Call to_dict()
on the final dataframe to get the desired result.
{'Drew Barrymore': {'Shrek': 3.0, 'V for Vendetta': 3.5}, 'Julia Roberts': {'Pretty Woman': 3.0, 'Shrek': 2.5, 'V for Vendetta': 3.5}}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.