I have 5 folders for Enron email dataset. I want to split enron1, enron3, enron5 into Training set and enron2,enron4 as Testing set in python. I can load full dataset and split. but can't put as mentioned earlier.
for i in range(1,6):
# folder containing the 2 categories of documents in individual folders.
movie_data = load_files(f"/Users/mehedihasan/Desktop/Study/SEM6/COMP723 Data Mining & Knowledge Engineering/Assignment/email data/enron{i}")
X = np.append(X, movie_data.data)
y = np.append(y, movie_data.target)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
Maybe use for i in [1,3,5]:
and for i in [2, 4]:
instead of range(1, 6)
for i in [1,3,5]:
# ... code ..
X_train = ...
y_train = ...
for i in [2, 4]:
# ... code ..
X_test = ...
y_test = ...
BTW:
If you have more folders then you can use
range(1, n, 2)
to get 1, 3, 5, 7, 9, ...
range(2, n, 2)
to get 2, 4, 6, 8, 10, ...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.