[英]Splitting folders into training and testing set
I have 5 folders for Enron email dataset.我有 5 个文件夹用于安然电子邮件数据集。 I want to split enron1, enron3, enron5 into Training set and enron2,enron4 as Testing set in python.
我想在 python 中将 enron1、enron3、enron5 拆分为训练集和 enron2、enron4 作为测试集。 I can load full dataset and split.
我可以加载完整的数据集并拆分。 but can't put as mentioned earlier.
但不能像前面提到的那样放。
for i in range(1,6):
# folder containing the 2 categories of documents in individual folders.
movie_data = load_files(f"/Users/mehedihasan/Desktop/Study/SEM6/COMP723 Data Mining & Knowledge Engineering/Assignment/email data/enron{i}")
X = np.append(X, movie_data.data)
y = np.append(y, movie_data.target)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
Maybe use for i in [1,3,5]:
and for i in [2, 4]:
instead of range(1, 6)
也许使用
for i in [1,3,5]:
和for i in [2, 4]:
而不是range(1, 6)
for i in [1,3,5]:
# ... code ..
X_train = ...
y_train = ...
for i in [2, 4]:
# ... code ..
X_test = ...
y_test = ...
BTW:顺便提一句:
If you have more folders then you can use如果您有更多文件夹,则可以使用
range(1, n, 2)
to get 1, 3, 5, 7, 9, ...
range(1, n, 2)
得到1, 3, 5, 7, 9, ...
range(2, n, 2)
to get 2, 4, 6, 8, 10, ...
range(2, n, 2)
得到2, 4, 6, 8, 10, ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.