[英]Extracting next rows in Pandas Dataframe based on column values
Suppose I have the following Dataframe假设我有以下 Dataframe
final raw act wc Start Finish
abc xyz 30 M5 17-01-2022 06:00 14-07-2031 02:36
abc xyz 40 F4 17-01-2022 06:00 14-07-2031 02:36
abc xyz 50 F6 17-01-2022 06:00 14-07-2031 02:36
abc xyz 60 F8 17-01-2022 06:00 14-07-2031 02:36
abc pqr 40 M14S 17-01-2022 06:00 18-01-2026 17:21
abc pqr 50 M12 17-01-2022 06:00 18-01-2026 17:21
abc pqr 60 M14S 17-01-2022 06:00 18-01-2026 17:21
abc pqr 20 F3 17-01-2022 06:00 14-07-2031 02:36
abc pqr 40 F4 17-01-2022 06:00 14-07-2031 02:36
abc pqr 50 F6 17-01-2022 06:00 14-07-2031 02:36
I would like to take the two rows from here, one is我想从这里取两行,一是
abc xyz 50 F6 17-01-2022 06:00 14-07-2031 02:36
another one is另一个是
abc pqr 50 F6 17-01-2022 06:00 14-07-2031 02:36
The logic would be that for each raw , pick up the next row where wc is either F3, F4 and the act is maximum.逻辑是,对于每个raw ,选择wc为F3、F4且行为最大的下一行。 Here for xyz , F4 is there, so the next row also for pqr , both F3, F4 are there but maximum act is 40.在这里xyz , F4在那里,所以下一行也是pqr , F3, F4都在那里,但最大行为是 40。
I did it using pd.shift()
我使用pd.shift()
dft = dfUno.loc[dfUno['wc'].shift().eq('F4')]
But I would like to see it in a more generic way, may be extracting using iterrows()
.但我想以更通用的方式查看它,可能是使用iterrows()
进行提取。 Like, my code is only true for F4 .就像,我的代码仅适用于F4 。 I want to extract the dataframe used for above F4/ F3 also.我也想提取用于上述F4/F3的 dataframe。
Expected outcome for this:对此的预期结果:
final raw act wc Start Finish
abc xyz 30 M5 17-01-2022 06:00 14-07-2031 02:36
abc xyz 40 F4 17-01-2022 06:00 14-07-2031 02:36
abc pqr 40 M14S 17-01-2022 06:00 18-01-2026 17:21
abc pqr 50 M12 17-01-2022 06:00 18-01-2026 17:21
abc pqr 60 M14S 17-01-2022 06:00 18-01-2026 17:21
abc pqr 20 F3 17-01-2022 06:00 14-07-2031 02:36
abc pqr 40 F4 17-01-2022 06:00 14-07-2031 02:36
pls suggest something, how to do it.请提出一些建议,如何去做。
You can first subset the frame to only have rows with wc
equal to "F3"
or "F4"
.您可以首先对框架进行子集化,使其仅具有wc
等于"F3"
或"F4"
的行。 Then group by the raw
column to see which index per group gives a maximum act
.然后按raw
列分组以查看每组的哪个索引给出了最大act
。 Then index the original frame with them:然后用它们索引原始帧:
>>> df.loc[df[df.wc.isin(["F3", "F4"])].groupby("raw", sort=False).act.idxmax()]
final raw act wc Start Finish
1 abc xyz 40 F4 17-01-2022 06:00 14-07-2031 02:36
8 abc pqr 40 F4 17-01-2022 06:00 14-07-2031 02:36
ie, IE,
>>> df.wc.isin(["F3", "F4"])
0 False
1 True
2 False
3 False
4 False
5 False
6 False
7 True
8 True
9 False
Name: wc, dtype: bool
>>> subset = df[df.wc.isin(["F3", "F4"])]
>>> subset
final raw act wc Start Finish
1 abc xyz 40 F4 17-01-2022 06:00 14-07-2031 02:36
7 abc pqr 20 F3 17-01-2022 06:00 14-07-2031 02:36
8 abc pqr 40 F4 17-01-2022 06:00 14-07-2031 02:36
>>> idxmax_act_per_raw = subset.groupby("raw", sort=False).act.idxmax()
>>> idxmax_act_per_raw
raw
xyz 1
pqr 8
Name: act, dtype: int64
>>> df.loc[idxmax_act_per_raw]
final raw act wc Start Finish
1 abc xyz 40 F4 17-01-2022 06:00 14-07-2031 02:36
8 abc pqr 40 F4 17-01-2022 06:00 14-07-2031 02:36
(The sort=False
of groupby
assures raw
isn't sorted whilst grouping, otherwise we'd get the row with "pqr"
first at the end since "pqr" < "xyz"
.) ( groupby
的sort=False
确保raw
在分组时不排序,否则我们会在最后得到带有"pqr"
的行,因为"pqr" < "xyz"
。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.