简体   繁体   中英

How to get the subset of dataframe based on another dataframe in pandas python

I just learnt pandas and basically I want to take the some rows of a dataframe based on the ids that are stored in another dataframe. Let me show you the code:

import pandas as pd
from sklearn.model_selection import train_test_split

f_data="data.tsv"
all_data = pd.read_csv(f_data,delimiter='\t',encoding='utf-8',header=None)
x_data = all_data[[0,1,3]]
y_data = all_data[[2]]

# Split train and test sets
x_train,x_test,y_train,y_test = train_test_split(x_data,y_data,test_size=0.1)

all_data have 12 columns in total. I use 3 of the columns in x_data and 1 of them in y_data.

Once I create x_train and x_test , I would like to write these instances into tsv files but while doing that I want to write all of the 12 columns stored in all_data . To be able to do that, I need to match the instances in x_train and x_test with all_data . How could I do that ?

EDIT

Here how my data looks like:

all_data

        0                                                  1                              2    3   ...                                                8                      9     10    11
0       35  Auch in Großbritannien, wo 19 Atomreaktoren in...                       Ausstieg -1.0  ...                                      Sunday Times           Sunday Times   NaN     1

# continues like that

x_train

         0                                                  1    3
939   2074  Die CSU verlangt von der schwarz-gelben Koalit...  1.0

So, what I want to do is to get the rows starting with 939,710,288,854,433 in all_data and write them into a file.

The index of the split data corresponds to the original, and can be used to look up the original data (assuming the index is unique):

all_data.loc[x_train.index]
all_data.loc[x_test.index]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM