如何从Excel工作表中获取数据并以设置格式获取输出？

Question

I'm making a movie recommendation system. 我正在制作电影推荐系统。 I need a python code which converts the data imported from an excel sheet to a set format (as shown below). 我需要一个python代码，它将从excel工作表导入的数据转换为设置格式（如下所示）。

enter image description here 在此处输入图片说明

Code to import data from the excel sheet: 从Excel工作表导入数据的代码：

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

df = pd.read_excel('project.xlsx', sheetname='Sheet1')
df.head(40)

Output I get: 我得到的输出：

        USER       MOVIE    RATINGS
0   Julia Roberts   Shrek   2.5
1   NaN         V for Vendetta  3.5
2   NaN         Pretty Woman    3.0
3   NaN            Star Wars    3.5
4   NaN    While You Were Sleeping  2.5
5   NaN     Phone Booth 3.0
6   Drew Barrymore  Shrek   3.0
7   NaN       V for Vendetta    3.5
8   NaN     Pretty Woman    1.5
9   NaN        Star Wars    5.0
10  NaN      Phone Booth    3.0
11  NaN   While You Were Sleeping   3.5
12  Kate Winslet       Shrek    2.5
13  NaN       V for Vendetta    3.0
14  NaN        Star Wars    3.5
15  NaN       Phone Booth   4.0
16  Tom Hanks   While You Were Sleeping 2.5
17  NaN           V for Vendetta    3.5
18  NaN         Pretty Woman    3.0
19  NaN         Star Wars   4.0
20  NaN     Phone Booth 4.5
....
......
......
......

enter image description here 在此处输入图片说明

From here I need to have an output like this: 从这里，我需要这样的输出：

dataset={
 'Julia Roberts': {
 'Shrek': 2.5,
 'I am Legend':3.0,
 'V for Vendetta': 3.5,
 'Pretty Woman': 0,
 "My Sister's Keeper":5.0,
 'Star Wars': 3.5,
 'Me Before You': 3.0,
 'While You Were Sleeping': 2.5,
 'Phone Booth': 3.0},

 'Drew Barrymore': {'Shrek': 3.0,
 'V for Vendetta': 3.5,
 'Pretty Woman': 1.5,
 "My Sister's Keeper":4.0,
 'Star Wars': 5.0,
 'Phone Booth': 3.0,
 'While You Were Sleeping': 3.5},


 'Tom Hanks': {'V for Vendetta': 3.5,
 'Pretty Woman': 3.0,
 'Phone Booth': 4.5,
 'Star Wars': 4.0,
 'While You Were Sleeping': 2.5,
 'I am Legend':3.5},

 'Sandra Bullock': {'Shrek': 3.0,
 'V for Vendetta': 4.0,
 'Pretty Woman': 2.0,
 'Star Wars': 3.0,
 'I am Legend':4.5,
 "My Sister's Keeper":3.5, 
 'Phone Booth': 3.0,
 'While You Were Sleeping': 2.0}
}

Code I am using (but showing error): 我正在使用的代码（但显示错误）：

max_nb_row = 0
for sheet in df.sheets():
  max_nb_row = max(max_nb_row, sheet.nrows)

for row in range(max_nb_row) :
  for sheet in df.sheets() :
    if row < sheet.nrows :
      print (sheet.row(row))

Answer 1

You can use this incomprehensible one-liner: 您可以使用这种难以理解的单线：

df.ffill().groupby('user').apply(lambda x: dict(zip(x['movie'], x['ratings']))).to_dict()

To visualize what's happening, we'll use this smaller dataframe: 为了可视化正在发生的事情，我们将使用以下较小的数据框：

>>> df
             user           movie  ratings
0   Julia Roberts           Shrek      2.5
1             NaN  V for Vendetta      3.5
2             NaN    Pretty Woman      3.0
3  Drew Barrymore           Shrek      3.0
4             NaN  V for Vendetta      3.5

Step by step, this is what happens: 逐步，这是发生的情况：

Use ffill to replace the NaN values in the user column with the name above. 使用ffill将user栏中的NaN值替换为上面的名称。

  user movie ratings 0 Julia Roberts Shrek 2.5 1 Julia Roberts V for Vendetta 3.5 2 Julia Roberts Pretty Woman 3.0 3 Drew Barrymore Shrek 3.0 4 Drew Barrymore V for Vendetta 3.5

Use groupby('user') to group the data by user 使用groupby('user')按用户分组数据
Use apply(lambda x: dict(zip(x['movie'], x['ratings'])) to create dicts of {movie: rating} pairs. 使用apply(lambda x: dict(zip(x['movie'], x['ratings']))创建{movie: rating}对的字典。
```
 user Drew Barrymore {'Shrek': 3.0, 'V for Vendetta': 3.5} Julia Roberts {'Shrek': 2.5, 'V for Vendetta': 3.5, 'Pretty ... dtype: object 
```

Call to_dict() on the final dataframe to get the desired result. 在最终数据帧上调用to_dict()以获得所需的结果。

 {'Drew Barrymore': {'Shrek': 3.0, 'V for Vendetta': 3.5}, 'Julia Roberts': {'Pretty Woman': 3.0, 'Shrek': 2.5, 'V for Vendetta': 3.5}}

如何从Excel工作表中获取数据并以设置格式获取输出？

问题描述

1 个解决方案

解决方案1
0 2018-03-27 11:13:22

如何从Excel工作表中获取数据并以设置格式获取输出？

问题描述

1 个解决方案

解决方案1 0 2018-03-27 11:13:22

解决方案1
0 2018-03-27 11:13:22