简体   繁体   中英

Python Pandas: How can I drop rows using df.drop and df.loc?

Suppose I have the following dataframe:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    {
        'user': ['Adam', 'Barry', 'Cindy', 'Dirk', 'Ella'],
        'income': [50000, 0, 100000, 30000, 0],
        'net worth': [250000, 1000000, 2000000, 50000, 0]
    }
)

在此处输入图像描述

So far, I've been removing rows based on conditions using the following:

df2 = df[df.income != 0]

在此处输入图像描述

And using multiple conditions like so:

df3 = df[(df['income'] != 0) & (df['net worth'] > 100000)]

在此处输入图像描述

Question: Is this the preferred way to drop rows? If not, what is? Is it possible to do this via df.drop and df.loc ? What would the syntax be?

.loc creates a subset of the rows you want to keep rather than .drop filter rows you want to remove. drop need the row label (index name).

The equivalent of your last filter with drop is:

>>> df.drop(df[~((df['income'] != 0) & (df['net worth'] > 100000))].index)

    user  income  net worth
0   Adam   50000     250000
2  Cindy  100000    2000000

# OR a bit smart:
>>> df.drop(df[(df['income'] == 0) | (df['net worth'] <= 100000)].index)

    user  income  net worth
0   Adam   50000     250000
2  Cindy  100000    2000000

Which syntax do you prefer?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM