简体   繁体   English

从数据透视表熊猫中提取较小的表

[英]Extract smaller table from pivot table pandas

I want to split the following pivot table into training and testing sets (to evaluate recommendation system), and was thinking of extracting two tables with non-overlapping indices (userID) and column values (ISBN).我想将以下数据透视表拆分为训练集和测试集(以评估推荐系统),并考虑提取两个具有非重叠索引 (userID) 和列值 (ISBN) 的表。 How can I split it properly?我怎样才能正确地分割它? Thank you.谢谢你。

在此处输入图片说明

As suggested by @moys, can use train_test_split from scikit-learn after splitting your dataframe columns first for the non-overlapping column names.正如@moys 所建议的,可以在首先将数据帧列拆分为不重叠的列名称后,使用scikit-learn train_test_split

Example:例子:

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split

Generate data:生成数据:

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

Split df columns in some way, eg half:以某种方式拆分 df 列,例如一半:

cols = int(len(df.columns)/2) df_A = df.iloc[:, 0:cols] df_B = df.iloc[:, cols:]

Use train_test_split:使用 train_test_split:

train_A, test_A = train_test_split(df_A, test_size=0.33) train_B, test_B = train_test_split(df_B, test_size=0.33)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM