How can I split the data into train and test dataset based on its labels? the labels are 1 and 0 and I want to use all 1 as train dataset and 0 as test dataset. the csv file looks like this:
1 Pixar classic is one of the best kids' movies of all time.
1 Apesar de representar um imenso avanço tecnológico, a força do filme reside no carisma de seus personagens e no charme de sua história.
1 When Woody perks up in the opening scene, it's not only the toy cowboy who comes alive - we're watching the rebirth of an art form.
0 The humans are wooden, the computer-animals have that floating, jerky gait of animated fauna.
1 Introduced not one but two indelible characters to the pop culture pantheon: cowboy rag-doll Woody (Tom Hanks) and plastic space ranger Buzz Lightyear (Tim Allen). [Blu-ray]
1 it is easy to see how virtually everything that is good in animation right now has some small seed in Toy Story
0 All the effects in the world can't disguise the thin plot.
1 Though some of the animation seems dated compared to later Pixar efforts and not nearly as detailed, what's here is done impeccably well.
Normally you would not want to do that but, following solution can work. I tried on a very small dataframe but seems to do the job.
import pandas as pd
Df = pd.DataFrame()
Df['label'] = ['S', 'S', 'S', 'P', 'P', 'S', 'P', 'S']
Df['value'] = [1, 2, 3, 4, 5, 6, 7, 8]
Df
X = Df[Df.label== 'S']
Y = Df[Df.label == 'P']
from sklearn.model_selection import train_test_split
xtrain, ytrain = train_test_split(X, test_size=0.3,random_state=25, shuffle=True)
xtest, ytest = train_test_split(Y, test_size=0.3,random_state=25, shuffle=True)
I got the followin results
xtrain
label value
5 S 6
2 S 3
7 S 8
xtest
label value
6 P 7
3 P 4
ytest
label value
4 P 5
ytrain
label value
0 S 1
1 S 2
try this,
mask = df['label']==1
df_train = df[mask]
df_test = df[~mask]
you just need filtering of dataframe.
d = {'col1': [1, 1, 1, 1, 0, 0, 0, 0], 'text': ["a", "b", "c", "d", "e", "f", "g", "h"]}
df = pd.DataFrame(data=d)
df.head()
label text
0 1 a
1 1 b
2 1 c
3 1 d
4 0 e
You can filter based on each row value using the code below, this captures data from col1 when it equals 1.
traindf = df[df["label"] == 1]
traindf
label text
0 1 a
1 1 b
2 1 c
3 1 d
testdf = df[df["label"] == 0]
testdf
label text
4 0 e
5 0 f
6 0 g
7 0 h
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.