简体   繁体   English

根据 R 中列的条件提取数据框中的行

[英]Extract rows in a data frame based on condition of a column in R

Suppose I have a data frame like this:假设我有一个这样的数据框:

port_id           report_dt       market_val
--------          ---------
100               1200            300
100               1200            500
100               1200            270

100               1300            320
100               1300            490
100               1300            310

101               1200            440
101               1200            320

102               1300            420
102               1300            425

Each row is a stock in the portfolio specified by port_id for each reporting date.每行是 port_id 为每个报告日期指定的投资组合中的一只股票。 One port_id can be reported once or more than once For example: port_id = 100 are reported twice, the first three rows are 3 stocks in port_id = 100 for the date 1200. And the next three rows are 3 stocks in port_id = 100, too;一个port_id可以报一次也可以多次报例如:port_id = 100报两次,前三行是1200年port_id = 100的3只股票,后面三行也是port_id = 100的3只股票; but for the date 1300. However port_id = 102 and 103 are reported only once.但是对于日期 1300。但是 port_id = 102 和 103 只报告一次。

I want to keep all MOST RECENTLY REPORTED stocks for each port_id, which should look like:我想为每个 port_id 保留所有最近报告的股票,它应该如下所示:

port_id           report_dt       market_val
--------          ---------
100               1300            320
100               1300            490
100               1300            310

101               1200            440
101               1200            320

102               1300            420
102               1300            425

Please tell me how can I do that.请告诉我我该怎么做。 Thanks谢谢

Here's an approach:这是一种方法:

df[df$report_dt == max(df$report_dt), ]
#  port_id report_dt market_val
#4     100      1300        320
#5     100      1300        490
#6     100      1300       3100

Update From your updated edit.更新来自您更新的编辑。 Here's a way:这里有一个方法:

splt <- lapply(split(df, df$port_id), function(x) x[x$report_dt == max(x$report_dt),])
newdf <- do.call(rbind, splt)
rownames(newdf) <- NULL
newdf
#   port_id report_dt market_val
# 1     100      1300        320
# 2     100      1300        490
# 3     100      1300       3100
# 4     101      1200        440
# 5     101      1200        320
# 6     102      1300        420
# 7     120      1300        425

Note: I hate that I had to split, apply, combine so literally.注意:我讨厌我不得不从字面上拆分、应用、组合。 But the SAC functions weren't working for me.但是 SAC 功能对我不起作用。 I'd love to optimize if anyone has ideas.如果有人有想法,我很乐意优化。

Here is my dplyr approach.这是我的dplyr方法。

library(dplyr)
filter(df, port_id == 100, report_dt == 1300)
  port_id report_dt market_val
1     100      1300        320
2     100      1300        490
3     100      1300       3100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM