简体   繁体   中英

Remove rows from a panda dataframe with unsorted index

This is how my data looks:

print(len(y_train),len(index_1))
index_1 = pd.DataFrame(data=index_1)
print("y_train: ")
print(y_train)
print("index_1: ")
print(index_1)

Output:

1348 555
y_train: 
1677    1
1519    0
1114    0
690     1
1012    1
       ..
1893    1
1844    0
1027    1
1649    1
1789    1
Name: Team 1 Win, Length: 1348, dtype: int64
index_1: 
        0
0       0
1       2
2       6
3       7
4       8
..    ...
550  1335
551  1341
552  1342
553  1344
554  1346

I want to remove a number of rows (index_1) from a panda dataframe (y_train). So the values in the index_1 df are the rows I want to remove. Problem is that the dataframe is not in order, so when index_1's first item is 0, I want it to remove the first row in y_train (ie index 1677), instead of the row with index 0. This is my attempt:

y_train_short = y_train.drop(index_1)

This is what I get:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-57-49f2cce7bac0> in <module>
     22 print(index_1)
     23 print(index_1)
---> 24 y_train_short = y_train.drop(index_1)
     25 
     26 

~/miniconda3/lib/python3.7/site-packages/pandas/core/series.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   4137             level=level,
   4138             inplace=inplace,
-> 4139             errors=errors,
   4140         )
   4141 

~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3934         for axis, labels in axes.items():
   3935             if labels is not None:
-> 3936                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3937 
   3938         if inplace:

~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in _drop_axis(self, labels, axis, level, errors)
   3968                 new_axis = axis.drop(labels, level=level, errors=errors)
   3969             else:
-> 3970                 new_axis = axis.drop(labels, errors=errors)
   3971             result = self.reindex(**{axis_name: new_axis})
   3972 

~/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in drop(self, labels, errors)
   5016         if mask.any():
   5017             if errors != "ignore":
-> 5018                 raise KeyError(f"{labels[mask]} not found in axis")
   5019             indexer = indexer[~mask]
   5020         return self.delete(indexer)

KeyError: '[0] not found in axis'

Independently of the fact that index 0 doesn't exist in y_train, I imagine that if it did, it would not do what I want it to do. So how do I remove the right rows from this dataframe?

Note that y_train.iloc[index_1[0]] retrieves rows from y_train taking indicated integer positions.

When you run y_train.iloc[index_1[0]].index , you will get indices of these rows.

So do drop these rows, you can run:

y_train.drop(y_train.iloc[index_1[0]].index, inplace=True)

You can use isin on index:

# set index to start from 0
y_train = y_train.reset_index(drop=True)

# do simple filter
y_train = y_train[~y_train.index.isin(index_1[0])].copy()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM