[英]Python Pandas to_dict function
I'm trying to create a dictionary, but can't get it to work the way I intend. 我正在尝试创建字典,但无法按照我的意愿使其工作。 I feel like I'm so close.
我觉得我很亲近。 I a df of yelp data:
我是yelp数据的df:
import pandas as pd
file_rev = 'blah.csv'
reviews=pd.read_csv(file_rev, sep=',',header=0, nrows=10000)
cols = ['user_id', 'business_id', 'stars']
cat_rev = reviews[cols]
print cat_rev
df: DF:
user_id business_id stars
0 Xqd0DzHaiyRqVH3WRG7hzg vcNAWiLM4dR7D2nwwJ7nCA 5
1 H1kH6QZV7Le4zqTRNxoZow vcNAWiLM4dR7D2nwwJ7nCA 2
2 zvJCcrpm2yOZrxKffwGQLA vcNAWiLM4dR7D2nwwJ7nCA 4
3 KBLW4wJA_fwoWmMhiHRVOA vcNAWiLM4dR7D2nwwJ7nCA 4
4 zvJCcrpm2yOZrxKffwGQLA vcNAWiLM4dR7D2nwwJ7nCA 4
5 Qrs3EICADUKNFoUq2iHStA vcNAWiLM4dR7D2nwwJ7nCA 1
6 jE5xVugujSaskAoh2DRx3Q vcNAWiLM4dR7D2nwwJ7nCA 5
7 QnhQ8G51XbUpVEyWY2Km-A vcNAWiLM4dR7D2nwwJ7nCA 5
8 tAB7GJpUuaKF4W-3P0d95A vcNAWiLM4dR7D2nwwJ7nCA 1
9 GP-h9colXgkT79BW7aDJeg vcNAWiLM4dR7D2nwwJ7nCA 5
10 uK8tzraOp4M5u3uYrqIBXg UsFtqoBl7naz8AVUBZMjQQ 5
I want to be able to create this as a dictionary that looks like: 我希望能够将其创建为如下所示的字典:
abc = {user1 : {business1:star_rating, business2:star_rating…,
businessN:star_rating},
user2: {} … }
Then to access I would just: abc[user1]
would give me all the places and stars that user1 reviewed. 然后访问,我将是:
abc[user1]
将给我user1审核过的所有地点和星级。
abc[user1][place1]
would give just corresponding star rating. abc[user1][place1]
只会给出相应的星级。
I've tried the to_dict
panda function. 我尝试了
to_dict
熊猫函数。 I tried to groupby
first, then dict(list(groupby()))
, and nothing seems to convert it to how I want. 我试图先进行
groupby
,然后再进行dict(list(groupby()))
,似乎没有任何东西可以将其转换为我想要的形式。
Also nope, but almost: 也没有,但几乎:
ddd = cat_rev.set_index('user_id').to_dict(outtype='list')
You could use groupby and a dict-comprehension: 您可以使用groupby和dict-comprehension:
{user_id: pd.Series(grp['stars'].values, index=grp['business_id']).to_dict()
for user_id, grp in df.groupby(['user_id'])}
yields 产量
{'GP-h9colXgkT79BW7aDJeg': {'vcNAWiLM4dR7D2nwwJ7nCA': 5},
'H1kH6QZV7Le4zqTRNxoZow': {'vcNAWiLM4dR7D2nwwJ7nCA': 2},
'KBLW4wJA_fwoWmMhiHRVOA': {'vcNAWiLM4dR7D2nwwJ7nCA': 4},
'QnhQ8G51XbUpVEyWY2Km-A': {'vcNAWiLM4dR7D2nwwJ7nCA': 5},
'Qrs3EICADUKNFoUq2iHStA': {'vcNAWiLM4dR7D2nwwJ7nCA': 1},
'Xqd0DzHaiyRqVH3WRG7hzg': {'vcNAWiLM4dR7D2nwwJ7nCA': 5},
'jE5xVugujSaskAoh2DRx3Q': {'vcNAWiLM4dR7D2nwwJ7nCA': 5},
'tAB7GJpUuaKF4W-3P0d95A': {'vcNAWiLM4dR7D2nwwJ7nCA': 1},
'uK8tzraOp4M5u3uYrqIBXg': {'UsFtqoBl7naz8AVUBZMjQQ': 5},
'zvJCcrpm2yOZrxKffwGQLA': {'vcNAWiLM4dR7D2nwwJ7nCA': 4}}
您也可以使用索引来压缩值...
d = {k:v for k,v in zip(df.index,df.to_dict('records'))}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.