简体   繁体   English

根据条件修剪熊猫数据框的最后一行

[英]trim last rows of a pandas dataframe based on a condition

let's assume a dataframe like this: 让我们假设这样一个数据框:

idx  x  y
0    a  3
1    b  2
2    c  0
3    d  2
4    e  5

how can I trim the bottom rows, based on a condition, so that any row after the last one matching the condition would be removed? 如何根据条件修剪底部的行,以便删除与该条件匹配的最后一行之后的任何行?

for example: 例如:

with the following condition: y == 0

the output would be 输出将是

idx  x  y
0    a  3
1    b  2
2    c  0

the condition can happen many times, but the last one is the one that triggers the cut. 这种情况可能会发生很多次,但最后一个是触发减产的情况。

you could do, here np.where returns a tuple, so we access the value of the indexes as the first element of the tuple using np.where(df.y == 0) , the first occurence is then returned as the last element of this vector, finaly we add 1 to the index so we can include this index of the last occurence while slicing 您可以这样做,这里np.where返回一个元组,因此我们使用np.where(df.y == 0)将索引的值作为该元组的第一个元素进行np.where(df.y == 0) ,然后将第一个np.where(df.y == 0)作为最后一个元素返回最后,我们将1加到索引,以便在切片时可以包含最后一次出现的索引

df_cond = df.iloc[:np.where(df.y == 0)[0][-1]+1, :]

or you could do : 或者你可以做:

df_cond  = df[ :df.y.eq(0).cumsum().idxmax()+1 ]

Method 1: 方法1:

Usng index.max & iloc : Usng index.maxiloc

  • index.max to get the last row with condition y==0 index.max以获取条件y==0的最后一行
  • iloc to slice of the dataframe on the index found with df['y'].eq(0) iloc在使用df['y'].eq(0)找到的索引上对数据帧进行切片
idx = df.query('y.eq(0)').index.max()+1 
# idx = df.query('y==0').index.max()+1 -- if pandas < 0.25 

df.iloc[:idx]

Output 输出量

   x  y
0  a  3
1  b  2
2  c  0

Method 2: 方法2:

Using np.where 使用np.where

idx = np.where(df['y'].eq(0), df.index, 0).max()+1
df.iloc[:idx]

Output 输出量

   x  y
0  a  3
1  b  2
2  c  0

I would do something like this: 我会做这样的事情:

df.iloc[:df['y'].eq(0).idxmax()+1]

Just look for the largest index where your condition is true. 只要寻找您的条件为真的最大索引。

EDIT 编辑

So the above code wont work because idxmax() still only takes the first index where the value is true. 因此,上面的代码将无法正常工作,因为idxmax()仍只取值为真的第一个索引。 So we we can do the following to trick it: 因此,我们可以执行以下操作来欺骗它:

df.iloc[:df['y'].eq(0).sort_index(ascending = False).idxmax()+1]

Flip the index, so the last index is the first index that idxmax picks up. 翻转索引,因此最后一个索引是idxmax选择的第一个索引。

Set up your dataframe: 设置数据框:

data = [
    [ 'a',  3],
[ 'b' , 2],
[  'c' , 0],
[  'd',  2],
[ 'e' , 5]
]
df = pd.DataFrame(data, columns=['x',  'y']).reset_index().rename(columns={'index':'idx'}).sort_values('idx')

Then find your cutoff (assuming the idx column is already sorted): 然后找到临界值(假设idx列已经排序):

cutoff = df[df['y'] == 0].idx.min()

The df['y'] == 0 is your condition. df ['y'] == 0是您的条件。 Then get the min idx that meets that condition and save it as our cutoff. 然后获取满足该条件的最小idx,并将其保存为我们的临界值。

Finally, create a new dataframe using your cutoff: 最后,使用截止值创建一个新的数据框:

df_new = df[df.idx <= cutoff].copy()

Output: 输出:

df_new

   idx  x   y
0   0   a   3
1   1   b   2
2   2   c   0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM