[英]Replacing aggregate results for specific rows with original rows in R
I am using aggregate function to aggregate results for a subset of my data set. 我正在使用聚合函数来聚合数据集的一部分的结果。 I want the final results to be replaces with the original rows (reference rows for aggregate). 我希望将最终结果替换为原始行(聚合的参考行)。 How can I do that? 我怎样才能做到这一点? Here is a sample data: 这是一个示例数据:
Day hour Case Time
Sat 7 2 35
Sun 8 8 125
Sun 9 10 145
Mon 10 15 18
Mon 11 17 167
Mon 12 20 220
Mon 13 25 135
Mon 14 14 167
I used the following line of code to aggregate Case and Time Values for "Sat" and "Sun" 我使用以下代码行汇总了“星期六”和“星期日”的大小写和时间值
aggregate(cbind(Case,Time)~Day,data=subset(TestData,Day == 'Sat' |Day == 'Sun' ),sum)
which works prefectly correct. 这完全正确。 However, I wonder how I can replace rows 2,3 and four of my sample data with the aggregate result I get. 但是,我不知道如何用得到的汇总结果替换示例数据的第2,3行和第4行。 I want the final result to be like this: 我希望最终结果是这样的:
Day hour Case Time
Sat 7 2 35
Sun 8 18 270
Mon 10 15 18
Mon 11 17 167
Mon 12 20 220
Mon 13 25 135
Mon 14 14 167
Thanks 谢谢
We can use data.table
to do this. 我们可以使用data.table
来做到这一点。 We select the columns that we need to get the sum
value ('nm1'). 我们选择获取sum
值('nm1')所需的列。 Convert the 'data.frame' to 'data.table' ( setDT(df1)
), specify the 'i' part with the logical condition to exclude other rows ( Day %in% c('Sat', 'Sun')
, using .SDcols
we select the columns for sum
, loop ( lapply
) through the Subset of Data.Table
( .SD
), and assign ( :=
) the output to the columns in 'nm1' and the rows specified in the 'i'. 将'data.frame'转换为'data.table'( setDT(df1)
),使用逻辑条件指定'i'部分以排除其他行( Day %in% c('Sat', 'Sun')
,使用.SDcols
我们选择的列sum
,环( lapply
通过) Subset of Data.Table
( .SD
),并分配( :=
)的输出以在“NM1”的列,并在“i”的指定的行。
library(data.table)
nm1 <- c('Case', 'Time')
setDT(df1)[Day %in% c('Sat', 'Sun'), (nm1) := lapply(.SD, sum),
Day, .SDcols=nm1]
If we need only the unique
rows, we can use the unique
from data.table
with the by
option 如果我们只需要unique
行,我们可以使用unique
的data.table
与by
选项
unique(df1, by=c('Case', 'Time'))
# Day hour Case Time
#1: Sat 7 2 35
#2: Sun 8 18 270
#3: Mon 10 15 18
#4: Mon 11 17 167
#5: Mon 12 20 220
#6: Mon 13 25 135
#7: Mon 14 14 167
Or if we are using the OP's aggregate
code, we can merge
the 'r1' with the original dataset ('df1'), replace the 'rows' with the logical index derived from 'NA' values after the merge
, subset the columns, remove the duplicated
rows and get the output 或者,如果我们使用OP的aggregate
代码,则可以merge
'r1'与原始数据集('df1') merge
,在merge
后用从'NA'值派生的逻辑索引替换'行',对列进行子集化,删除duplicated
行并获取输出
r1 <- aggregate(cbind(Case,Time)~Day,data=subset(df1,
Day == 'Sat' |Day == 'Sun' ),sum)
r2 <- merge(df1, r1, by='Day', all.x=TRUE)
r2[indx, c('Case.x', 'Time.x')] <- r2[indx, c('Case.y', 'Time.y')]
We select only the columns that are need 我们只选择需要的列
r3 <- r2[1:4]
Remove the duplicate rows in the 'Case', 'Time' column 删除“案例”,“时间”列中的重复行
r3[!duplicated(r3[3:4]),]
# Day hour Case.x Time.x
#1 Mon 10 15 18
#2 Mon 11 17 167
#3 Mon 12 20 220
#4 Mon 13 25 135
#5 Mon 14 14 167
#6 Sat 7 2 35
#7 Sun 8 18 270
df1 <- structure(list(Day = c("Sat", "Sun", "Sun", "Mon", "Mon", "Mon",
"Mon", "Mon"), hour = 7:14, Case = c(2L, 8L, 10L, 15L, 17L, 20L,
25L, 14L), Time = c(35L, 125L, 145L, 18L, 167L, 220L, 135L, 167L
)), .Names = c("Day", "hour", "Case", "Time"), class = "data.frame",
row.names = c(NA, -8L))
Building on what you have 建立在您拥有的东西之上
ind<-with(TestData,Day == 'Sat' |Day == 'Sun')
s<-aggregate(.~Day,data=TestData[ind,],sum)
rbind(s,TestData[!ind,])
Day hour Case Time 1 Sat 7 2 35 2 Sun 17 18 270 4 Mon 10 15 18 5 Mon 11 17 167 6 Mon 12 20 220 7 Mon 13 25 135 8 Mon 14 14 167
However, from the desired output in the question you may wish to do 但是,从问题的期望输出中,您可能希望执行此操作
s$hour<-with(TestData[ind,],hour[!duplicated(Day)])
before the rbind
to get the first hour
instead of the sum of hours 在rbind
之前获得第一个小rbind
不是hour
的总和
Day hour Case Time 1 Sat 7 2 35 2 Sun 8 18 270 4 Mon 10 15 18 5 Mon 11 17 167 6 Mon 12 20 220 7 Mon 13 25 135 8 Mon 14 14 167
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.