[英]sumif in ifelse condition R
I have a DT with multiple columns and I need to give a condition in ifelse and do the calculations accordingly. 我有一个包含多列的DT,我需要在ifelse中给出一个条件并进行相应的计算。 I want it to do count/sum(count) grouped by segment.
我希望它执行按段分组的计数/总和。 Here is the DT
这是DT
Segment Count Flag
A 23 Y
B 45 N
A 56 N
B 212 Y
I want the fourth column as count per total count of the segment based on the flag so the out put should look something like this. 我想要第四列作为基于标志的段的总计数,因此输出应该看起来像这样。 For flag N it is the share of the count per segment.
对于标志N,它是每段计数的份额。 For flag Y, it is the revenue percentage calculation if the No(N) becomes Yes(Y) and in that case the revenue that could be earned.
对于标志Y,如果No(N)变为Yes(Y),则是收入百分比计算,在这种情况下可以获取收入。 I am sorry as it is clumsy but kindly ask me in comments if you have any doubts.
很抱歉,它很笨拙,如果您有任何疑问,请在评论中问我。
Segment Count Flag Rev Value
A 23 Y 34 ((56/23)*34)/(34+69)
B 45 N 48 45/(45+212)
A 56 N 23 56/(56+23)
B 212 Y 67 ((45/212)*67)/(67+12)
A 65 Y 69 ...
B 10 Y 12 ...
Any help is appreciated. 任何帮助表示赞赏。 Thanks!
谢谢!
We can do this with data.table
. 我们可以使用
data.table
做到这data.table
。 Convert the 'data.frame' to 'data.table' ( setDT(DT)
), grouped by 'Segment', create the 'Value' column by diviing the 'Count' by the sum
of 'Count', then we update the 'Value' where the Flag' is 'N' 将'data.frame'转换为'data.table'(
setDT(DT)
),按'Segment'分组,通过将'Count'除以'Count'的sum
来创建'Value'列,然后更新“值”,其中标志为“ N”
library(data.table)
setDT(DT)[, Value := Count/sum(Count), Segment
][Flag == "N", Value := Count/sum(Count), Segment]
DT
# Segment Count Flag Value
#1: A 23 Y 0.18852459
#2: B 45 N 1.00000000
#3: A 56 N 1.00000000
#4: B 212 Y 0.78810409
#5: A 43 Y 0.35245902
#6: B 12 Y 0.04460967
Just checking with the OPs expected output 'Value' 仅检查OP的预期输出“值”
> 23/122
#[1] 0.1885246
> 212/269
#[1] 0.7881041
> 43/122
#[1] 0.352459
> 12/269
#[1] 0.04460967
Based on the update No:3 in Op's post 基于Op帖子中的更新No:3
s1 <- setDT(DT1)[, .(rn = .I[Flag == "Y"], Value = (Rev[Flag=="Y"] *
(Count[Flag == "N"]/Count[Flag=="Y"]))/sum(Rev[Flag == "Y"])), Segment]
s2 <- DT1[, .(rn = .I[Flag == "N"], Value = Count[Flag == "N"]/(Count[Flag == "N"] +
Count[Flag=="Y"][1])), Segment]
DT1[, Value := rbind(s1, s2)[order(rn)]$Value]
DT1
# Segment Count Flag Rev Value
#1: A 23 Y 34 0.8037146
#2: B 45 N 48 0.1750973
#3: A 56 N 23 0.7088608
#4: B 212 Y 67 0.1800215
#5: A 65 Y 69 0.5771471
#6: B 10 Y 12 0.6835443
>((56/23)*34)/(34+69)
#[1] 0.8037146
> 45/(45+212)
#[1] 0.1750973
> 56/(56+23)
#[1] 0.7088608
> ((45/212)*67)/(67+12)
#[1] 0.1800215
DT <- structure(list(Segment = c("A", "B", "A", "B", "A", "B"), Count = c(23L,
45L, 56L, 212L, 43L, 12L), Flag = c("Y", "N", "N", "Y", "Y",
"Y")), .Names = c("Segment", "Count", "Flag"), row.names = c(NA,
-6L), class = "data.frame")
DT1 <- structure(list(Segment = c("A", "B", "A", "B", "A", "B"), Count = c(23L,
45L, 56L, 212L, 65L, 10L), Flag = c("Y", "N", "N", "Y", "Y",
"Y"), Rev = c(34L, 48L, 23L, 67L, 69L, 12L)), .Names = c("Segment",
"Count", "Flag", "Rev"), class = "data.frame", row.names = c(NA,
-6L))
Alternatively we could have also used dplyr
pkg for that... 另外,我们也可以
dplyr
使用dplyr
pkg ...
Updating based on the suggestions provided by @Aramis7d - thanks! 根据@ Aramis7d提供的建议进行更新-谢谢!
library(data.table)
df <- fread("Segment Count Flag
A 23 Y
B 45 N
A 56 N
B 212 Y
A 43 Y
B 12 Y")
library(dplyr)
df %>%
group_by(Segment) %>%
mutate(Value = Count/sum(Count)) %>%
group_by(Segment, Flag) %>%
mutate(Value = if_else( Flag == "N", Count/sum(Count), Value))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.