How to get the subset of dataframe based on another dataframe in pandas python

Question

I just learnt pandas and basically I want to take the some rows of a dataframe based on the ids that are stored in another dataframe. Let me show you the code:

import pandas as pd
from sklearn.model_selection import train_test_split

f_data="data.tsv"
all_data = pd.read_csv(f_data,delimiter='\t',encoding='utf-8',header=None)
x_data = all_data[[0,1,3]]
y_data = all_data[[2]]

# Split train and test sets
x_train,x_test,y_train,y_test = train_test_split(x_data,y_data,test_size=0.1)

all_data have 12 columns in total. I use 3 of the columns in x_data and 1 of them in y_data.

Once I create x_train and x_test , I would like to write these instances into tsv files but while doing that I want to write all of the 12 columns stored in all_data . To be able to do that, I need to match the instances in x_train and x_test with all_data . How could I do that ?

EDIT

Here how my data looks like:

all_data

        0                                                  1                              2    3   ...                                                8                      9     10    11
0       35  Auch in Großbritannien, wo 19 Atomreaktoren in...                       Ausstieg -1.0  ...                                      Sunday Times           Sunday Times   NaN     1

# continues like that

x_train

         0                                                  1    3
939   2074  Die CSU verlangt von der schwarz-gelben Koalit...  1.0

So, what I want to do is to get the rows starting with 939,710,288,854,433 in all_data and write them into a file.

Answer 1

The index of the split data corresponds to the original, and can be used to look up the original data (assuming the index is unique):

all_data.loc[x_train.index]
all_data.loc[x_test.index]

How to get the subset of dataframe based on another dataframe in pandas python

Question

1 answers

solution1
0 2018-08-05 10:53:48

How to get the subset of dataframe based on another dataframe in pandas python

Question

1 answers

solution1 0 2018-08-05 10:53:48

solution1
0 2018-08-05 10:53:48