简体   繁体   English

k-fold 交叉验证程序将数据拆分为训练和测试

[英]k-fold cross validation procedure to split the data into training and test

i'm trying to do a k-fold cross validation procedure to split the data into training and test subsets but i'm not sure how to do this:我正在尝试进行 k 折交叉验证程序以将数据拆分为训练和测试子集,但我不确定如何执行此操作:

 products = pd.read_csv("product_imgs.csv")

  kf = model_selection.KFold(n_splits=2, shuffle=True)
   for train_index, test_index in kf.split(products):
       print('train: %s, test: %s' % (products[train_index], products[test_index]))

example of data, data is of images: (label 1 = cars, label 0 = vans)数据示例,数据是图像:(标签 1 = 汽车,label 0 = 货车) 在此处输入图像描述

Error:错误:

KeyError: "None of [Int64Index([    0,     1,     2,     3,     5,     6,     7,     9,    10,\n               11,\n            ...\n            13981, 13982, 13986, 13987, 13989, 13990, 13993, 13995, 13996,\n            13997],\n           dtype='int64', length=7000)] are in the [columns]"

The returned train_index and test_index from kf.split() are indexes. kf.split()返回的train_indextest_index是索引。 Therefore, in your print function, you should use .loc to access with indexes as showed in the code below.因此,在您的打印 function 中,您应该使用.loc来访问索引,如下面的代码所示。

print('train: %s, test: %s' % (products.loc[train_index, :], products.loc[test_index, :]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM