简体   繁体   中英

Splitting folders into training and testing set

I have 5 folders for Enron email dataset. I want to split enron1, enron3, enron5 into Training set and enron2,enron4 as Testing set in python. I can load full dataset and split. but can't put as mentioned earlier.

for i in range(1,6):
    # folder containing the 2 categories of documents in individual folders.
    movie_data = load_files(f"/Users/mehedihasan/Desktop/Study/SEM6/COMP723 Data Mining & Knowledge Engineering/Assignment/email data/enron{i}") 
    X = np.append(X, movie_data.data)
    y = np.append(y, movie_data.target)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

Maybe use for i in [1,3,5]: and for i in [2, 4]: instead of range(1, 6)

for i in [1,3,5]:
    # ... code ..
    X_train = ...
    y_train = ...

for i in [2, 4]:
    # ... code ..
    X_test = ...
    y_test = ...

BTW:

If you have more folders then you can use

  • range(1, n, 2) to get 1, 3, 5, 7, 9, ...
  • range(2, n, 2) to get 2, 4, 6, 8, 10, ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM