[英]How to pull the first instance when a column satisfies a certain condition in pandas?
I'm trying to pull the first instance an account balance equals or drops below 0. In the example below I would like to create a column where only the row where X and Y move from a positive number to below or equal to 0 ie X would be 2017-1-4 in row 4 and Y would be 2018-2-3 in row 8.我正在尝试将第一个实例中的帐户余额等于或低于 0。在下面的示例中,我想创建一个列,其中只有 X 和 Y 从正数移动到低于或等于 0 的行,即 X将是第 4 行中的 2017-1-4,而 Y 将是第 8 行中的 2018-2-3。
df= pd.DataFrame()
df['Account'] = ['X','X','X','X','X','Y','Y','Y']
df['Balance'] = [100,90,80,0,0,900,90,-1]
df['Date'] = [pd.to_datetime('2017-1-1'),pd.to_datetime('2017-1-2'),pd.to_datetime('2017-1-3'),pd.to_datetime('2017-1-4'),pd.to_datetime('2017-1-5'),pd.to_datetime('2018-2-1'),pd.to_datetime('2018-2-2'),pd.to_datetime('2018-2-3')]
print(df)
Thanks!谢谢!
edit: I think the answer I probably looking for was something like this编辑:我认为我可能正在寻找的答案是这样的
x = df.groupby('Account')['Balance']\
.apply(lambda x: (x<=0) & (0<x.shift()))
This would return the instance when the balance went to 0 or less and compare to what is was previously.这将在余额变为 0 或更少时返回实例,并与之前的情况进行比较。 However, when I try to get the date information it gives me a number which I don't get:
但是,当我尝试获取日期信息时,它给了我一个我没有得到的数字:
y = np.where(x,df['Date'],pd.NaT)
array([NaT, NaT, NaT, 1483488000000000000, NaT, NaT, NaT, 1517616000000000000], dtype=object)数组([NaT,NaT,NaT,1483488000000000000,NaT,NaT,NaT,1517616000000000000],dtype = 对象)
How do I resolve this?我该如何解决这个问题? Still quite new to Python and Pandas so might be something quite obvious!
Python 和 Pandas 仍然很新,所以可能很明显!
A possible solution could be using df.values which returns the dataframe as a numpy array object.一种可能的解决方案是使用 df.values,它将 dataframe 作为 numpy 数组 object 返回。 You can then use a combination of for loops to iterate through each row of the dataframe and check if account == X or Y and Balance <= 0, and return the date if so:
然后,您可以使用 for 循环的组合来遍历 dataframe 的每一行并检查 account == X 或 Y 和 Balance <= 0,如果是,则返回日期:
def zero_bal(a, df=df):
for each in df.values:
if each[0] == a and each[1] <= 0:
return each[2]
X, Y = zero_bal('X'), zero_bal('Y')
In the code above, the "each" in "for each in df.values:" would be something like:在上面的代码中,“for each in df.values:”中的“each”类似于:
['X', 80, Timestamp('2017-01-03 00:00:00')] ['X', 80, 时间戳('2017-01-03 00:00:00')]
You can then use indices each[0], each[1] and each[2] to select the Account, Balance and Date respectively and check whether they are what you are looking for.然后,您可以分别使用索引 each[0]、each[1] 和 each[2] 到 select 帐户、余额和日期,并检查它们是否是您要查找的内容。
You could apply the boolean mask directly to your dataframe, as follows: x = df.groupby('Account')['Balance'].apply(lambda x: (x<=0) & (0<x.shift()))
您可以将 boolean 掩码直接应用于 dataframe,如下所示:
x = df.groupby('Account')['Balance'].apply(lambda x: (x<=0) & (0<x.shift()))
df[x]
or df[x]['column_name_that_you_need']
df[x]
或df[x]['column_name_that_you_need']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.