简体   繁体   中英

Delete rows from pandas DataFrame with non-unique index

I am looking for a way to delete rows in a pandas DataFrame when the index is not guaranteed to be unique.

So, I want to drop items 0 and 4 from my DataFrame df. This would be the typical code you would use to do that:

df.drop([0, 4].index)

If each index is unique, this works fine. However, if items 0, 1, and 2 all have the same index, this code drops items 0, 1, 2, and 4, instead of just 0 and 4.

My DataFrame is set up this way for good reasons, so I don't want to restructure my data, which looks approximately like this:

        age
site             
mc03    0.39
mc03    0.348
mc03    0.348
mc03    0.42
mc04    0.78

I tried:

del df.iloc[0]

but this fails with:

AttributeError: __delitem__

Any other suggestions for how to accomplish this task?

Update:

I found two ways to do it, but neither is particularly elegant.

to_drop = [0, 4]
df = df.iloc[sorted(set(range(len(df))) - set(to_drop))]
# or:
df = df.iloc[[i for i in range(len(df)) if i not in to_drop]]

Maybe this is as good as it's going to get, though?

This is not very elegant too, but let me post it as an alternative:

df = df.reset_index().drop([0, 4]).set_index("site")

It temporarily changes the index to a regular index, drops the rows and sets the original index back. The idea is from this answer .

alternative solution (using numpy):

In [252]: mask = np.ones(len(df)).astype(bool)

In [253]: mask[[0,4]] = False

In [254]: mask
Out[254]: array([False,  True,  True,  True, False], dtype=bool)

In [255]: df[mask]
Out[255]:
        age
mc03  0.348
mc03  0.348
mc03  0.420

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM