简体   繁体   English

稀疏(CSR)特征矩阵上的分层KFold

[英]Stratified KFold on sparse(csr) feature matrix

I have a large sparse matrix (95000, 12000) containing the features of my model. 我有一个包含我模型特征的大型稀疏矩阵(95000、12000)。 I want to do a stratified K fold cross validation using Sklearn.cross_validation module in python. 我想使用python中的Sklearn.cross_validation模块进行分层的K折交叉验证。 However, I haven't found a way of indexing a sparse matrix in python. 但是,我还没有找到在python中索引稀疏矩阵的方法。

Is there anyway I can perform StratifiedKFold on my sparse feature matrix? 无论如何,我可以在稀疏特征矩阵上执行StratifiedKFold吗?

try this: 尝试这个:

# First make sure sparse matrix is to_csr
X_sparse = x.tocsr()
y= output
X_train = {}
Y_train = {}

skf = StratifiedKFold(5, shuffle=True, random_state=12345)
i=0
for train_index, test_index in skf.split(X,y):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train[i], X_test[i] = X[train_index], X[test_index]
    y_train[i], y_test[i] = y[train_index], y[test_index]
    i +=1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM