[英]How to convert Pandas dataframe to np.array while preserving the index?
For example, I have a small set of data (from movielens) 例如,我有一小部分数据(来自movielens)
check.csv check.csv
userId,movieId,rating,timestamp
1,31,2.5,1260759144
1,1029,3.0,1260759179
1,1061,3.0,1260759182
2,17,5.0,835355681
3,267,3.0,1298861761
3,296,4.5,1298862418
3,318,5.0,1298862121
If I do 如果我做
rating = pd.read_csv('check.csv')
Y = pd.pivot_table(rating, values='rating', index=['movieId'], columns=['userId']).values
it will create 3*7 matrix. 它将创建3 * 7矩阵。 But what I want is a 3*1061 (having userId as column index and movie as row index). 但是我想要的是3 * 1061(将userId作为列索引,将movie作为行索引)。 How to achieve that 3*1061 matrix how can I make a 1061*3 matrix S, with S[31][1]=2.5 S[1029][1]=3 etc. and all the missing entries equals zero. 如何实现3 * 1061矩阵,我如何制作一个1061 * 3矩阵S,其中S [31] [1] = 2.5 S [1029] [1] = 3等,所有缺失的条目都为零。
Okay, then, I think you want this. 好吧,那么,我想你想要这个。
df = pd.read_csv('check.csv')
Y = pd.pivot_table(df, values=['rating'], index=['movieId'], columns=['userId'])
df_out = pd.DataFrame(index=np.arange(Y.index.values.max())
).merge(Y, left_index=True, right_index=True, how='outer'
).fillna(0))
df = pd.read_csv('check.csv')
Y = pd.pivot_table(df, values=['rating'], index=['movieId'], columns=['userId'])
rating
userId 1 2 3
movieId
31 2.5 0 0
1029 3.0 0 0
1061 3.0 0 0
17 0 5.0 0
296 0 0 4.0
remaining value will come according to csv values. 剩余值将根据csv值得出。 More details http://pbpython.com/pandas-pivot-table-explained.html 更多详细信息http://pbpython.com/pandas-pivot-table-explained.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.