简体   繁体   中英

How to pull the first instance when a column satisfies a certain condition in pandas?

I'm trying to pull the first instance an account balance equals or drops below 0. In the example below I would like to create a column where only the row where X and Y move from a positive number to below or equal to 0 ie X would be 2017-1-4 in row 4 and Y would be 2018-2-3 in row 8.

df= pd.DataFrame()
df['Account'] = ['X','X','X','X','X','Y','Y','Y']
df['Balance'] = [100,90,80,0,0,900,90,-1]
df['Date'] = [pd.to_datetime('2017-1-1'),pd.to_datetime('2017-1-2'),pd.to_datetime('2017-1-3'),pd.to_datetime('2017-1-4'),pd.to_datetime('2017-1-5'),pd.to_datetime('2018-2-1'),pd.to_datetime('2018-2-2'),pd.to_datetime('2018-2-3')]
print(df)

Thanks!

edit: I think the answer I probably looking for was something like this

x = df.groupby('Account')['Balance']\
       .apply(lambda x: (x<=0) & (0<x.shift()))

This would return the instance when the balance went to 0 or less and compare to what is was previously. However, when I try to get the date information it gives me a number which I don't get:

y = np.where(x,df['Date'],pd.NaT)

array([NaT, NaT, NaT, 1483488000000000000, NaT, NaT, NaT, 1517616000000000000], dtype=object)

How do I resolve this? Still quite new to Python and Pandas so might be something quite obvious!

A possible solution could be using df.values which returns the dataframe as a numpy array object. You can then use a combination of for loops to iterate through each row of the dataframe and check if account == X or Y and Balance <= 0, and return the date if so:

def zero_bal(a, df=df):
    for each in df.values:
        if each[0] == a and each[1] <= 0:
                return each[2]

X, Y = zero_bal('X'), zero_bal('Y')

In the code above, the "each" in "for each in df.values:" would be something like:

['X', 80, Timestamp('2017-01-03 00:00:00')]

You can then use indices each[0], each[1] and each[2] to select the Account, Balance and Date respectively and check whether they are what you are looking for.

You could apply the boolean mask directly to your dataframe, as follows: x = df.groupby('Account')['Balance'].apply(lambda x: (x<=0) & (0<x.shift()))

df[x] or df[x]['column_name_that_you_need']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM