简体   繁体   English

用R中的原始行替换特定行的汇总结果

[英]Replacing aggregate results for specific rows with original rows in R

I am using aggregate function to aggregate results for a subset of my data set. 我正在使用聚合函数来聚合数据集的一部分的结果。 I want the final results to be replaces with the original rows (reference rows for aggregate). 我希望将最终结果替换为原始行(聚合的参考行)。 How can I do that? 我怎样才能做到这一点? Here is a sample data: 这是一个示例数据:

 Day  hour    Case   Time
 Sat  7       2    35
 Sun  8       8    125
 Sun  9       10   145
 Mon  10      15   18
 Mon  11      17   167
 Mon  12      20   220
 Mon  13      25   135
 Mon  14      14   167

I used the following line of code to aggregate Case and Time Values for "Sat" and "Sun" 我使用以下代码行汇总了“星期六”和“星期日”的大小写和时间值

aggregate(cbind(Case,Time)~Day,data=subset(TestData,Day == 'Sat' |Day == 'Sun' ),sum)

which works prefectly correct. 这完全正确。 However, I wonder how I can replace rows 2,3 and four of my sample data with the aggregate result I get. 但是,我不知道如何用得到的汇总结果替换示例数据的第2,3行和第4行。 I want the final result to be like this: 我希望最终结果是这样的:

       Day  hour    Case   Time
 Sat  7       2    35
 Sun  8       18   270
 Mon  10      15   18
 Mon  11      17   167
 Mon  12      20   220
 Mon  13      25   135
 Mon  14      14   167  

Thanks 谢谢

We can use data.table to do this. 我们可以使用data.table来做到这一点。 We select the columns that we need to get the sum value ('nm1'). 我们选择获取sum值('nm1')所需的列。 Convert the 'data.frame' to 'data.table' ( setDT(df1) ), specify the 'i' part with the logical condition to exclude other rows ( Day %in% c('Sat', 'Sun') , using .SDcols we select the columns for sum , loop ( lapply ) through the Subset of Data.Table ( .SD ), and assign ( := ) the output to the columns in 'nm1' and the rows specified in the 'i'. 将'data.frame'转换为'data.table'( setDT(df1) ),使用逻辑条件指定'i'部分以排除其他行( Day %in% c('Sat', 'Sun') ,使用.SDcols我们选择的列sum ,环( lapply通过) Subset of Data.Table.SD ),并分配( := )的输出以在“NM1”的列,并在“i”的指定的行。

library(data.table)
nm1 <- c('Case', 'Time')
setDT(df1)[Day %in% c('Sat', 'Sun'), (nm1) := lapply(.SD, sum),
                        Day, .SDcols=nm1]

If we need only the unique rows, we can use the unique from data.table with the by option 如果我们只需要unique行,我们可以使用uniquedata.tableby选项

unique(df1, by=c('Case', 'Time'))
#   Day hour Case Time
#1: Sat    7    2   35
#2: Sun    8   18  270
#3: Mon   10   15   18
#4: Mon   11   17  167
#5: Mon   12   20  220
#6: Mon   13   25  135
#7: Mon   14   14  167

Or if we are using the OP's aggregate code, we can merge the 'r1' with the original dataset ('df1'), replace the 'rows' with the logical index derived from 'NA' values after the merge , subset the columns, remove the duplicated rows and get the output 或者,如果我们使用OP的aggregate代码,则可以merge 'r1'与原始数据集('df1') merge ,在merge后用从'NA'值派生的逻辑索引替换'行',对列进行子集化,删除duplicated行并获取输出

r1 <- aggregate(cbind(Case,Time)~Day,data=subset(df1,
                Day == 'Sat' |Day == 'Sun' ),sum)

r2 <- merge(df1, r1, by='Day', all.x=TRUE)
r2[indx, c('Case.x', 'Time.x')] <- r2[indx, c('Case.y', 'Time.y')]

We select only the columns that are need 我们只选择需要的列

r3 <- r2[1:4]

Remove the duplicate rows in the 'Case', 'Time' column 删除“案例”,“时间”列中的重复行

r3[!duplicated(r3[3:4]),]
#   Day hour Case.x Time.x
#1 Mon   10     15     18
#2 Mon   11     17    167
#3 Mon   12     20    220
#4 Mon   13     25    135
#5 Mon   14     14    167
#6 Sat    7      2     35
#7 Sun    8     18    270

data 数据

df1 <- structure(list(Day = c("Sat", "Sun", "Sun", "Mon", "Mon", "Mon", 
"Mon", "Mon"), hour = 7:14, Case = c(2L, 8L, 10L, 15L, 17L, 20L, 
25L, 14L), Time = c(35L, 125L, 145L, 18L, 167L, 220L, 135L, 167L
)), .Names = c("Day", "hour", "Case", "Time"), class = "data.frame", 
row.names = c(NA, -8L))

Building on what you have 建立在您拥有的东西之上

ind<-with(TestData,Day == 'Sat' |Day == 'Sun')
s<-aggregate(.~Day,data=TestData[ind,],sum)
rbind(s,TestData[!ind,])
Day hour Case Time
1 Sat    7    2   35
2 Sun   17   18  270
4 Mon   10   15   18
5 Mon   11   17  167
6 Mon   12   20  220
7 Mon   13   25  135
8 Mon   14   14  167

However, from the desired output in the question you may wish to do 但是,从问题的期望输出中,您可能希望执行此操作

s$hour<-with(TestData[ind,],hour[!duplicated(Day)])

before the rbind to get the first hour instead of the sum of hours rbind之前获得第一个小rbind不是hour的总和

Day hour Case Time
1 Sat    7    2   35
2 Sun    8   18  270
4 Mon   10   15   18
5 Mon   11   17  167
6 Mon   12   20  220
7 Mon   13   25  135
8 Mon   14   14  167

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM