I have a DT with multiple columns and I need to give a condition in ifelse and do the calculations accordingly. I want it to do count/sum(count) grouped by segment. Here is the DT
Segment Count Flag
A 23 Y
B 45 N
A 56 N
B 212 Y
I want the fourth column as count per total count of the segment based on the flag so the out put should look something like this. For flag N it is the share of the count per segment. For flag Y, it is the revenue percentage calculation if the No(N) becomes Yes(Y) and in that case the revenue that could be earned. I am sorry as it is clumsy but kindly ask me in comments if you have any doubts.
Segment Count Flag Rev Value
A 23 Y 34 ((56/23)*34)/(34+69)
B 45 N 48 45/(45+212)
A 56 N 23 56/(56+23)
B 212 Y 67 ((45/212)*67)/(67+12)
A 65 Y 69 ...
B 10 Y 12 ...
Any help is appreciated. Thanks!
We can do this with data.table
. Convert the 'data.frame' to 'data.table' ( setDT(DT)
), grouped by 'Segment', create the 'Value' column by diviing the 'Count' by the sum
of 'Count', then we update the 'Value' where the Flag' is 'N'
library(data.table)
setDT(DT)[, Value := Count/sum(Count), Segment
][Flag == "N", Value := Count/sum(Count), Segment]
DT
# Segment Count Flag Value
#1: A 23 Y 0.18852459
#2: B 45 N 1.00000000
#3: A 56 N 1.00000000
#4: B 212 Y 0.78810409
#5: A 43 Y 0.35245902
#6: B 12 Y 0.04460967
Just checking with the OPs expected output 'Value'
> 23/122
#[1] 0.1885246
> 212/269
#[1] 0.7881041
> 43/122
#[1] 0.352459
> 12/269
#[1] 0.04460967
Based on the update No:3 in Op's post
s1 <- setDT(DT1)[, .(rn = .I[Flag == "Y"], Value = (Rev[Flag=="Y"] *
(Count[Flag == "N"]/Count[Flag=="Y"]))/sum(Rev[Flag == "Y"])), Segment]
s2 <- DT1[, .(rn = .I[Flag == "N"], Value = Count[Flag == "N"]/(Count[Flag == "N"] +
Count[Flag=="Y"][1])), Segment]
DT1[, Value := rbind(s1, s2)[order(rn)]$Value]
DT1
# Segment Count Flag Rev Value
#1: A 23 Y 34 0.8037146
#2: B 45 N 48 0.1750973
#3: A 56 N 23 0.7088608
#4: B 212 Y 67 0.1800215
#5: A 65 Y 69 0.5771471
#6: B 10 Y 12 0.6835443
>((56/23)*34)/(34+69)
#[1] 0.8037146
> 45/(45+212)
#[1] 0.1750973
> 56/(56+23)
#[1] 0.7088608
> ((45/212)*67)/(67+12)
#[1] 0.1800215
DT <- structure(list(Segment = c("A", "B", "A", "B", "A", "B"), Count = c(23L,
45L, 56L, 212L, 43L, 12L), Flag = c("Y", "N", "N", "Y", "Y",
"Y")), .Names = c("Segment", "Count", "Flag"), row.names = c(NA,
-6L), class = "data.frame")
DT1 <- structure(list(Segment = c("A", "B", "A", "B", "A", "B"), Count = c(23L,
45L, 56L, 212L, 65L, 10L), Flag = c("Y", "N", "N", "Y", "Y",
"Y"), Rev = c(34L, 48L, 23L, 67L, 69L, 12L)), .Names = c("Segment",
"Count", "Flag", "Rev"), class = "data.frame", row.names = c(NA,
-6L))
Alternatively we could have also used dplyr
pkg for that...
Updating based on the suggestions provided by @Aramis7d - thanks!
library(data.table)
df <- fread("Segment Count Flag
A 23 Y
B 45 N
A 56 N
B 212 Y
A 43 Y
B 12 Y")
library(dplyr)
df %>%
group_by(Segment) %>%
mutate(Value = Count/sum(Count)) %>%
group_by(Segment, Flag) %>%
mutate(Value = if_else( Flag == "N", Count/sum(Count), Value))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.