[英]how to split data into train and test based on a column values and shuffle the combinations?
[英]How to train/test/split data based on labels?
如何根據標簽將數據拆分為訓練和測試數據集? 標簽是 1 和 0,我想使用所有 1 作為訓練數據集和 0 作為測試數據集。 csv 文件如下所示:
1 Pixar classic is one of the best kids' movies of all time.
1 Apesar de representar um imenso avanço tecnológico, a força do filme reside no carisma de seus personagens e no charme de sua história.
1 When Woody perks up in the opening scene, it's not only the toy cowboy who comes alive - we're watching the rebirth of an art form.
0 The humans are wooden, the computer-animals have that floating, jerky gait of animated fauna.
1 Introduced not one but two indelible characters to the pop culture pantheon: cowboy rag-doll Woody (Tom Hanks) and plastic space ranger Buzz Lightyear (Tim Allen). [Blu-ray]
1 it is easy to see how virtually everything that is good in animation right now has some small seed in Toy Story
0 All the effects in the world can't disguise the thin plot.
1 Though some of the animation seems dated compared to later Pixar efforts and not nearly as detailed, what's here is done impeccably well.
通常您不想這樣做,但是,以下解決方案可以工作。 我嘗試了一個非常小的數據框,但似乎可以完成這項工作。
import pandas as pd
Df = pd.DataFrame()
Df['label'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
Df['value'] = [1, 2, 3, 4, 5, 6, 7, 8]
Df
X = Df[Df.label== 'S']
Y = Df[Df.label == 'P']
from sklearn.model_selection import train_test_split
xtrain, ytrain = train_test_split(X, test_size=0.3,random_state=25, shuffle=True)
xtest, ytest = train_test_split(Y, test_size=0.3,random_state=25, shuffle=True)
我得到了以下結果
xtrain
label value
5 S 6
2 S 3
7 S 8
xtest
label value
6 P 7
3 P 4
ytest
label value
4 P 5
ytrain
label value
0 S 1
1 S 2
嘗試這個,
mask = df['label']==1
df_train = df[mask]
df_test = df[~mask]
你只需要過濾數據框。
d = {'col1': [1, 1, 1, 1, 0, 0, 0, 0], 'text': ["a", "b", "c", "d", "e", "f", "g", "h"]}
df = pd.DataFrame(data=d)
df.head()
label text
0 1 a
1 1 b
2 1 c
3 1 d
4 0 e
您可以使用下面的代碼根據每一行值進行過濾,這會在 col1 等於 1 時從 col1 中捕獲數據。
traindf = df[df["label"] == 1]
traindf
label text
0 1 a
1 1 b
2 1 c
3 1 d
testdf = df[df["label"] == 0]
testdf
label text
4 0 e
5 0 f
6 0 g
7 0 h
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.