简体   繁体   English

如何从Excel工作表中获取数据并以设置格式获取输出?

[英]How to fetch data from an excel sheet and get the output in set format?

I'm making a movie recommendation system. 我正在制作电影推荐系统。 I need a python code which converts the data imported from an excel sheet to a set format (as shown below). 我需要一个python代码,它将从excel工作表导入的数据转换为设置格式(如下所示)。

enter image description here 在此处输入图片说明

Code to import data from the excel sheet: 从Excel工作表导入数据的代码:

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

df = pd.read_excel('project.xlsx', sheetname='Sheet1')
df.head(40)

Output I get: 我得到的输出:

        USER       MOVIE    RATINGS
0   Julia Roberts   Shrek   2.5
1   NaN         V for Vendetta  3.5
2   NaN         Pretty Woman    3.0
3   NaN            Star Wars    3.5
4   NaN    While You Were Sleeping  2.5
5   NaN     Phone Booth 3.0
6   Drew Barrymore  Shrek   3.0
7   NaN       V for Vendetta    3.5
8   NaN     Pretty Woman    1.5
9   NaN        Star Wars    5.0
10  NaN      Phone Booth    3.0
11  NaN   While You Were Sleeping   3.5
12  Kate Winslet       Shrek    2.5
13  NaN       V for Vendetta    3.0
14  NaN        Star Wars    3.5
15  NaN       Phone Booth   4.0
16  Tom Hanks   While You Were Sleeping 2.5
17  NaN           V for Vendetta    3.5
18  NaN         Pretty Woman    3.0
19  NaN         Star Wars   4.0
20  NaN     Phone Booth 4.5
....
......
......
......

enter image description here 在此处输入图片说明

From here I need to have an output like this: 从这里,我需要这样的输出:

dataset={
 'Julia Roberts': {
 'Shrek': 2.5,
 'I am Legend':3.0,
 'V for Vendetta': 3.5,
 'Pretty Woman': 0,
 "My Sister's Keeper":5.0,
 'Star Wars': 3.5,
 'Me Before You': 3.0,
 'While You Were Sleeping': 2.5,
 'Phone Booth': 3.0},

 'Drew Barrymore': {'Shrek': 3.0,
 'V for Vendetta': 3.5,
 'Pretty Woman': 1.5,
 "My Sister's Keeper":4.0,
 'Star Wars': 5.0,
 'Phone Booth': 3.0,
 'While You Were Sleeping': 3.5},


 'Tom Hanks': {'V for Vendetta': 3.5,
 'Pretty Woman': 3.0,
 'Phone Booth': 4.5,
 'Star Wars': 4.0,
 'While You Were Sleeping': 2.5,
 'I am Legend':3.5},

 'Sandra Bullock': {'Shrek': 3.0,
 'V for Vendetta': 4.0,
 'Pretty Woman': 2.0,
 'Star Wars': 3.0,
 'I am Legend':4.5,
 "My Sister's Keeper":3.5, 
 'Phone Booth': 3.0,
 'While You Were Sleeping': 2.0}
}

Code I am using (but showing error): 我正在使用的代码(但显示错误):

max_nb_row = 0
for sheet in df.sheets():
  max_nb_row = max(max_nb_row, sheet.nrows)

for row in range(max_nb_row) :
  for sheet in df.sheets() :
    if row < sheet.nrows :
      print (sheet.row(row))

You can use this incomprehensible one-liner: 您可以使用这种难以理解的单线:

df.ffill().groupby('user').apply(lambda x: dict(zip(x['movie'], x['ratings']))).to_dict()

To visualize what's happening, we'll use this smaller dataframe: 为了可视化正在发生的事情,我们将使用以下较小的数据框:

>>> df
             user           movie  ratings
0   Julia Roberts           Shrek      2.5
1             NaN  V for Vendetta      3.5
2             NaN    Pretty Woman      3.0
3  Drew Barrymore           Shrek      3.0
4             NaN  V for Vendetta      3.5

Step by step, this is what happens: 逐步,这是发生的情况:

  1. Use ffill to replace the NaN values in the user column with the name above. 使用ffilluser栏中的NaN值替换为上面的名称。

      user movie ratings 0 Julia Roberts Shrek 2.5 1 Julia Roberts V for Vendetta 3.5 2 Julia Roberts Pretty Woman 3.0 3 Drew Barrymore Shrek 3.0 4 Drew Barrymore V for Vendetta 3.5 
  2. Use groupby('user') to group the data by user 使用groupby('user')按用户分组数据

  3. Use apply(lambda x: dict(zip(x['movie'], x['ratings'])) to create dicts of {movie: rating} pairs. 使用apply(lambda x: dict(zip(x['movie'], x['ratings']))创建{movie: rating}对的字典。

     user Drew Barrymore {'Shrek': 3.0, 'V for Vendetta': 3.5} Julia Roberts {'Shrek': 2.5, 'V for Vendetta': 3.5, 'Pretty ... dtype: object 
  4. Call to_dict() on the final dataframe to get the desired result. 在最终数据帧上调用to_dict()以获得所需的结果。

     {'Drew Barrymore': {'Shrek': 3.0, 'V for Vendetta': 3.5}, 'Julia Roberts': {'Pretty Woman': 3.0, 'Shrek': 2.5, 'V for Vendetta': 3.5}} 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM