简体   繁体   English

根据列值提取 Pandas Dataframe 中的下一行

[英]Extracting next rows in Pandas Dataframe based on column values

Suppose I have the following Dataframe假设我有以下 Dataframe

  final  raw  act    wc             Start            Finish
   abc  xyz   30    M5  17-01-2022 06:00  14-07-2031 02:36
   abc  xyz   40    F4  17-01-2022 06:00  14-07-2031 02:36
   abc  xyz   50    F6  17-01-2022 06:00  14-07-2031 02:36
   abc  xyz   60    F8  17-01-2022 06:00  14-07-2031 02:36
   abc  pqr   40  M14S  17-01-2022 06:00  18-01-2026 17:21
   abc  pqr   50   M12  17-01-2022 06:00  18-01-2026 17:21
   abc  pqr   60  M14S  17-01-2022 06:00  18-01-2026 17:21
   abc  pqr   20    F3  17-01-2022 06:00  14-07-2031 02:36
   abc  pqr   40    F4  17-01-2022 06:00  14-07-2031 02:36
   abc  pqr   50    F6  17-01-2022 06:00  14-07-2031 02:36

I would like to take the two rows from here, one is我想从这里取两行,一是

 abc  xyz   50  F6  17-01-2022 06:00  14-07-2031 02:36

another one is另一个是

abc  pqr   50    F6  17-01-2022 06:00  14-07-2031 02:36

The logic would be that for each raw , pick up the next row where wc is either F3, F4 and the act is maximum.逻辑是,对于每个raw ,选择wcF3、F4行为最大的下一行。 Here for xyz , F4 is there, so the next row also for pqr , both F3, F4 are there but maximum act is 40.在这里xyzF4在那里,所以下一行也是pqrF3, F4都在那里,但最大行为是 40。

I did it using pd.shift()我使用pd.shift()

dft = dfUno.loc[dfUno['wc'].shift().eq('F4')]

But I would like to see it in a more generic way, may be extracting using iterrows() .但我想以更通用的方式查看它,可能是使用iterrows()进行提取。 Like, my code is only true for F4 .就像,我的代码仅适用于F4 I want to extract the dataframe used for above F4/ F3 also.我也想提取用于上述F4/F3的 dataframe。

Expected outcome for this:对此的预期结果:

 final  raw  act    wc             Start            Finish
   abc  xyz   30    M5  17-01-2022 06:00  14-07-2031 02:36
   abc  xyz   40    F4  17-01-2022 06:00  14-07-2031 02:36
   abc  pqr   40  M14S  17-01-2022 06:00  18-01-2026 17:21
   abc  pqr   50   M12  17-01-2022 06:00  18-01-2026 17:21
   abc  pqr   60  M14S  17-01-2022 06:00  18-01-2026 17:21
   abc  pqr   20    F3  17-01-2022 06:00  14-07-2031 02:36
   abc  pqr   40    F4  17-01-2022 06:00  14-07-2031 02:36

pls suggest something, how to do it.请提出一些建议,如何去做。

You can first subset the frame to only have rows with wc equal to "F3" or "F4" .您可以首先对框架进行子集化,使其仅具有wc等于"F3""F4"的行。 Then group by the raw column to see which index per group gives a maximum act .然后按raw列分组以查看每组的哪个索引给出了最大act Then index the original frame with them:然后用它们索引原始帧:

>>> df.loc[df[df.wc.isin(["F3", "F4"])].groupby("raw", sort=False).act.idxmax()]

  final  raw  act  wc             Start            Finish
1   abc  xyz   40  F4  17-01-2022 06:00  14-07-2031 02:36
8   abc  pqr   40  F4  17-01-2022 06:00  14-07-2031 02:36

ie, IE,

>>> df.wc.isin(["F3", "F4"]) 
0    False
1     True
2    False
3    False
4    False
5    False
6    False
7     True
8     True
9    False
Name: wc, dtype: bool

>>> subset = df[df.wc.isin(["F3", "F4"])]
>>> subset
  final  raw  act  wc             Start            Finish
1   abc  xyz   40  F4  17-01-2022 06:00  14-07-2031 02:36
7   abc  pqr   20  F3  17-01-2022 06:00  14-07-2031 02:36
8   abc  pqr   40  F4  17-01-2022 06:00  14-07-2031 02:36

>>> idxmax_act_per_raw = subset.groupby("raw", sort=False).act.idxmax()
>>> idxmax_act_per_raw 
raw
xyz    1
pqr    8
Name: act, dtype: int64

>>> df.loc[idxmax_act_per_raw]
  final  raw  act  wc             Start            Finish
1   abc  xyz   40  F4  17-01-2022 06:00  14-07-2031 02:36
8   abc  pqr   40  F4  17-01-2022 06:00  14-07-2031 02:36

(The sort=False of groupby assures raw isn't sorted whilst grouping, otherwise we'd get the row with "pqr" first at the end since "pqr" < "xyz" .) groupbysort=False确保raw在分组时不排序,否则我们会在最后得到带有"pqr"的行,因为"pqr" < "xyz" 。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM