添加行并查找R中特定列值的最新日期

Question

I have data like this. 我有这样的数据。

Date        CIFNO   POS             POS CITY    NO OF TXNS  TXN.AMOUNT
1/5/2015    12000   Billdesk.com_   CRET         6           8,681.0
3/21/2014   12000   MTNL-BILLDESK   MUMBAI       1           556.0
2/1/2015    13000   bookmyshow      CRET         1           1,134.8
10/15/2013  12000   LUCKY           LUCKNOW      1           5,150.0
9/23/2015   13000   BASE PVT        KOLKATA      1           3,505.0

I needed to sum no of transactions and transaction amount for each CIF. 我需要汇总每个CIF的交易数量和交易金额。 Moreover, I also needed to keep the latest date of POS transaction and corresponding POS and POS city for each CIF. 此外，我还需要保留每个CIF的POS交易的最新日期以及相应的POS和POS城市。 Essentially, I needed to get output like this: 本质上，我需要获得如下输出：

Date        CIFNO   POS             POS CITY    NO OF TXNS  TXN.AMOUNT
1/5/2015    12000   Billdesk.com_   CRET        8           14,387.00
9/23/2015   13000   BASE PVT        KOLKATA     2           4,639.8

This adds no of transactions and trxn amount for CIF 12000 and 13000. It also takes the latest POS transaction date(1/5/2015 for CIF 12000 and 9/23/2015 for CIF 13000) and gets the corresponding POS and POS CITY for those latest dates for each CIF. 这不会增加CIF 12000和13000的交易次数和trxn金额。它还会获取最新的POS交易日期（CIF 12000的交易日为1/5/2015和CIF 13000的交易日为9/23/2015），并获得相应的POS和POS CITY每个CIF的最新日期。 Could anyone please help me on this? 有人可以帮我吗？ Thanks a lot in advance. 非常感谢。

Answer 1

You can use data.table . 您可以使用data.table 。 the following syntax gives you the result you need. 以下语法为您提供所需的结果。

library(data.table)
setDT(df)
df[, Date := as.Date(strptime(as.character(Date),"%m/%d/%Y"))]
df[, TXN.AMOUNT := as.numeric(gsub(",","", TXN.AMOUNT))]

res <- df[order(-Date), .(Date=Date[1L],
                          POS=POS[1L],
                          POS.CITY=POS.CITY[1L],
                          NO.OF.TXNS=sum(NO.OF.TXNS),
                          TXN.AMOUNT=sum(TXN.AMOUNT)),
                         by = CIFNO]
setcolorder(res, c(2:1, 3:6))

we get the following result: 我们得到以下结果：

res
##          Date CIFNO           POS POS.CITY NO.OF.TXNS TXN.AMOUNT
## 1: 2015-09-23 13000      BASE.PVT  KOLKATA          2     4639.8
## 2: 2015-01-05 12000 Billdesk.com_     CRET          8    14387.0

Or a more robust solution 或更强大的解决方案

cols <- grep("TXN", names(df), value = TRUE)
df[order(Date), c(lapply(.SD[, cols, with = FALSE], sum),
                  lapply(.SD[, setdiff(names(.SD), cols), with = FALSE], last)), 
     by = CIFNO]
##    CIFNO NO.OF.TXNS TXN.AMOUNT       Date           POS POS.CITY
## 1: 12000          8    14387.0 2015-01-05 Billdesk.com_     CRET
## 2: 13000          2     4639.8 2015-09-23      BASE PVT  KOLKATA

Answer 2

Using dplyr you could do: 使用dplyr您可以执行以下操作：

library(dplyr)
data$TXN.AMOUNT<-as.numeric(gsub(",","", data$TXN.AMOUNT))
data$Date <- as.Date(strptime(as.character(data$Date),"%m/%d/%Y"))

data%>% group_by(CIFNO) %>% arrange(Date) %>% summarise(Date=last(Date),
                                                    POS=last(POS),
                                                    POS.CITY=last(POS.CITY),
                                                    TXN.AMOUNT=sum(TXN.AMOUNT),
                                                    NO.OF.TXNS=sum(NO.OF.TXNS))

You get: 你得到：

  CIFNO       Date           POS POS.CITY TXN.AMOUNT NO.OF.TXNS
1 12000 2015-01-05 Billdesk.com_     CRET    14387.0          8
2 13000 2015-09-23      BASE_PVT  KOLKATA     4639.8          2

添加行并查找R中特定列值的最新日期

问题描述

2 个解决方案

解决方案1
2 2015-07-23 10:16:58

解决方案2
1 已采纳 2015-07-23 10:21:23

添加行并查找R中特定列值的最新日期

问题描述

2 个解决方案

解决方案1 2 2015-07-23 10:16:58

解决方案2 1 已采纳 2015-07-23 10:21:23

解决方案1
2 2015-07-23 10:16:58

解决方案2
1 已采纳 2015-07-23 10:21:23