简体   繁体   中英

How to fetch data from an excel sheet and get the output in set format?

I'm making a movie recommendation system. I need a python code which converts the data imported from an excel sheet to a set format (as shown below).

enter image description here

Code to import data from the excel sheet:

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

df = pd.read_excel('project.xlsx', sheetname='Sheet1')
df.head(40)

Output I get:

        USER       MOVIE    RATINGS
0   Julia Roberts   Shrek   2.5
1   NaN         V for Vendetta  3.5
2   NaN         Pretty Woman    3.0
3   NaN            Star Wars    3.5
4   NaN    While You Were Sleeping  2.5
5   NaN     Phone Booth 3.0
6   Drew Barrymore  Shrek   3.0
7   NaN       V for Vendetta    3.5
8   NaN     Pretty Woman    1.5
9   NaN        Star Wars    5.0
10  NaN      Phone Booth    3.0
11  NaN   While You Were Sleeping   3.5
12  Kate Winslet       Shrek    2.5
13  NaN       V for Vendetta    3.0
14  NaN        Star Wars    3.5
15  NaN       Phone Booth   4.0
16  Tom Hanks   While You Were Sleeping 2.5
17  NaN           V for Vendetta    3.5
18  NaN         Pretty Woman    3.0
19  NaN         Star Wars   4.0
20  NaN     Phone Booth 4.5
....
......
......
......

enter image description here

From here I need to have an output like this:

dataset={
 'Julia Roberts': {
 'Shrek': 2.5,
 'I am Legend':3.0,
 'V for Vendetta': 3.5,
 'Pretty Woman': 0,
 "My Sister's Keeper":5.0,
 'Star Wars': 3.5,
 'Me Before You': 3.0,
 'While You Were Sleeping': 2.5,
 'Phone Booth': 3.0},

 'Drew Barrymore': {'Shrek': 3.0,
 'V for Vendetta': 3.5,
 'Pretty Woman': 1.5,
 "My Sister's Keeper":4.0,
 'Star Wars': 5.0,
 'Phone Booth': 3.0,
 'While You Were Sleeping': 3.5},


 'Tom Hanks': {'V for Vendetta': 3.5,
 'Pretty Woman': 3.0,
 'Phone Booth': 4.5,
 'Star Wars': 4.0,
 'While You Were Sleeping': 2.5,
 'I am Legend':3.5},

 'Sandra Bullock': {'Shrek': 3.0,
 'V for Vendetta': 4.0,
 'Pretty Woman': 2.0,
 'Star Wars': 3.0,
 'I am Legend':4.5,
 "My Sister's Keeper":3.5, 
 'Phone Booth': 3.0,
 'While You Were Sleeping': 2.0}
}

Code I am using (but showing error):

max_nb_row = 0
for sheet in df.sheets():
  max_nb_row = max(max_nb_row, sheet.nrows)

for row in range(max_nb_row) :
  for sheet in df.sheets() :
    if row < sheet.nrows :
      print (sheet.row(row))

You can use this incomprehensible one-liner:

df.ffill().groupby('user').apply(lambda x: dict(zip(x['movie'], x['ratings']))).to_dict()

To visualize what's happening, we'll use this smaller dataframe:

>>> df
             user           movie  ratings
0   Julia Roberts           Shrek      2.5
1             NaN  V for Vendetta      3.5
2             NaN    Pretty Woman      3.0
3  Drew Barrymore           Shrek      3.0
4             NaN  V for Vendetta      3.5

Step by step, this is what happens:

  1. Use ffill to replace the NaN values in the user column with the name above.

      user movie ratings 0 Julia Roberts Shrek 2.5 1 Julia Roberts V for Vendetta 3.5 2 Julia Roberts Pretty Woman 3.0 3 Drew Barrymore Shrek 3.0 4 Drew Barrymore V for Vendetta 3.5 
  2. Use groupby('user') to group the data by user

  3. Use apply(lambda x: dict(zip(x['movie'], x['ratings'])) to create dicts of {movie: rating} pairs.

     user Drew Barrymore {'Shrek': 3.0, 'V for Vendetta': 3.5} Julia Roberts {'Shrek': 2.5, 'V for Vendetta': 3.5, 'Pretty ... dtype: object 
  4. Call to_dict() on the final dataframe to get the desired result.

     {'Drew Barrymore': {'Shrek': 3.0, 'V for Vendetta': 3.5}, 'Julia Roberts': {'Pretty Woman': 3.0, 'Shrek': 2.5, 'V for Vendetta': 3.5}} 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM