[英]k-fold cross validation procedure to split the data into training and test
i'm trying to do a k-fold cross validation procedure to split the data into training and test subsets but i'm not sure how to do this:我正在尝试进行 k 折交叉验证程序以将数据拆分为训练和测试子集,但我不确定如何执行此操作:
products = pd.read_csv("product_imgs.csv")
kf = model_selection.KFold(n_splits=2, shuffle=True)
for train_index, test_index in kf.split(products):
print('train: %s, test: %s' % (products[train_index], products[test_index]))
example of data, data is of images: (label 1 = cars, label 0 = vans)数据示例,数据是图像:(标签 1 = 汽车,label 0 = 货车)
Error:错误:
KeyError: "None of [Int64Index([ 0, 1, 2, 3, 5, 6, 7, 9, 10,\n 11,\n ...\n 13981, 13982, 13986, 13987, 13989, 13990, 13993, 13995, 13996,\n 13997],\n dtype='int64', length=7000)] are in the [columns]"
The returned train_index
and test_index
from kf.split()
are indexes. kf.split()
返回的train_index
和test_index
是索引。 Therefore, in your print function, you should use .loc
to access with indexes as showed in the code below.因此,在您的打印 function 中,您应该使用
.loc
来访问索引,如下面的代码所示。
print('train: %s, test: %s' % (products.loc[train_index, :], products.loc[test_index, :]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.