My dataset consists of scores and total respondents for questions asked in a survey, over a number of fiscal years (FY13, FY14 & FY15) and in different regions.
My objective is to loop through the FY
column and identify when each question was asked, for each region. And store this information in a new column.
This is what a reproducible sample looks like -
testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"),
Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)),
QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)),
Very.Satisfied=runif(16,min = 0, max=1),
Total.Very.Satisfied=floor(runif(16,min=10,max=120)),
Satisfied=runif(16,min = 0, max=1),
Total.Satisfied=floor(runif(16,min=10,max=120)),
Dissatisfied=runif(16,min = 0, max=1),
Total.Dissatisfied=floor(runif(16,min=10,max=120)),
Very.Dissatisfied=runif(16,min = 0, max=1),
Total.Very.Dissatisfied=floor(runif(16,min=10,max=120)))
I start with creating an ID column, by concatenating Region
& QST
library(tidyr)
testdf = testdf %>%
unite(ID,c('Region','QST'),sep = "",remove = F)
My Objective
1) For each unique ID
, identify whether the given question was asked -
a) Only on one year (either FY13, FY14 or FY15)
b) Over the Past Two Years (FY15 & FY14 only)
c) Over the Past Three Years (FY15 & FY14 & FY13)
d) On FY13 & FY15 Only
My Attempt
For this problem, I tried to create a for loop
, and for each unique ID
, I first store the unique occurences of each FY the question was asked in a vector v
. Then using an IF conditional statement I assign a comment to a newly created column called Tally
based on these occurences.
for (i in unique(testdf$ID))
{
v=unique(testdf$FY)
if(('FY15' %in% v) & ('FY14' %in% v)) {
testdf$Tally=='Asked Over The Past Two Years'
}
else if(('FY15' %in% v) & ('FY14' %in% v) & ('FY13' %in% v)) {
testdf$Tally=='Asked Over The Past Three Years'
}
else if(('FY13' %in% v) & ('FY15' %in% v)) {
testdf$Tally=='Question Asked in FY13 & FY15 Only'
}
else { testdf$Tally=='Question Asked Once Only'
}
}
The loop seems to run without throwing an error message, but it doesn't seem to create the new Tally
column.
Any help with this will be greatly appreciated.
In your code the main problem is that in the if-else clause you're not doing an assignment (using '<-') but a comparison, using '=='. Here's a solution that I find more elegant, since it's not using a loop:
require(tidyverse)
testdf %>%
select(ID, FY) %>%
unique() %>%
mutate(is_true = 1) %>%
spread(key = FY, value = is_true, fill = 0) %>%
mutate(tally = case_when(
FY13 == 1 & FY14 == 1 & FY15 == 1 ~ 'Asked Over The Past Three Years',
FY14 == 1 & FY15 == 1 ~ 'Asked Over the Past Two Years',
FY13 == 1 & FY15 == 1 ~ 'Asked in FY12 & FY15 Only',
TRUE ~ 'Question Asked Once Only'
))
Output:
+------------------------------------------------------------+
| ID FY13 FY14 FY15 tally |
+------------------------------------------------------------+
| 1 AFRICAQ2 1 1 1 Asked Over The Past Three Years |
| 2 AFRICAQ5 0 1 1 Asked Over the Past Two Years |
| 3 AMERICAQ2 1 1 1 Asked Over The Past Three Years |
| 4 AMERICAQ5 1 1 1 Asked Over The Past Three Years |
| 5 ASIAQ2 1 1 1 Asked Over The Past Three Years |
| 6 ASIAQ5 1 0 1 Asked in FY12 & FY15 Only |
+------------------------------------------------------------+
No need for a loop:
library(tidyverse)
result <- testdf %>%
select(3, 2, 1) %>%
mutate(Asked = 1) %>%
spread(FY, Asked)
> result
QST Region FY13 FY14 FY15
1 Q2 AFRICA 1 1 1
2 Q2 AMERICA 1 1 1
3 Q2 ASIA 1 1 1
4 Q5 AFRICA NA 1 1
5 Q5 AMERICA 1 1 1
6 Q5 ASIA 1 NA 1
Answers all four questions in one go.
If you really want a tally column, expand it like this:
result %>%
mutate(Tally = case_when(FY13 + FY14 + FY15 == 1 ~ "Only one year",
FY13 + FY14 + FY15 == 3 ~ "Past three years",
FY14 + FY15 == 2 ~ "Past two years",
FY13 + FY15 == 2 ~ "FY13 and FY15 only",
NA ~ NA_character_))
QST Region FY13 FY14 FY15 Tally
1 Q2 AFRICA 1 1 1 Past three years
2 Q2 AMERICA 1 1 1 Past three years
3 Q2 ASIA 1 1 1 Past three years
4 Q5 AFRICA NA 1 1 Past two years
5 Q5 AMERICA 1 1 1 Past three years
6 Q5 ASIA 1 NA 1 FY13 and FY15 only
Consider ave
for grouping calculation by Region and QST inside nested ifelse
for conditional logic:
testdf <- within(testdf, {
FY13 <- ifelse(FY=='FY13', 1, 0)
FY14 <- ifelse(FY=='FY14', 1, 0)
FY15 <- ifelse(FY=='FY15', 1, 0)
Tally <- ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 1,
'Asked Only on One Year',
ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 3,
'Asked Over the Past Three Years',
ifelse(ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
'Asked Over the Past Two Years',
ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
'Asked On FY13 & FY15 Only',
NA
)
)
)
)
FY13 <- NULL; FY14 <- NULL; FY15 <- NULL
})
testdf[c("ID", "FY", "Tally")]
# Region QST FY Tally
# 1 AFRICA Q2 FY13 Asked Over the Past Three Years
# 2 AFRICA Q2 FY14 Asked Over the Past Three Years
# 3 AFRICA Q2 FY15 Asked Over the Past Three Years
# 4 AFRICA Q5 FY14 Asked Over the Past Two Years
# 5 AFRICA Q5 FY15 Asked Over the Past Two Years
# 6 ASIA Q2 FY13 Asked Over the Past Three Years
# 7 ASIA Q2 FY14 Asked Over the Past Three Years
# 8 ASIA Q2 FY15 Asked Over the Past Three Years
# 9 ASIA Q5 FY13 Asked On FY13 & FY15 Only
# 10 ASIA Q5 FY15 Asked On FY13 & FY15 Only
# 11 AMERICA Q2 FY13 Asked Over the Past Three Years
# 12 AMERICA Q2 FY14 Asked Over the Past Three Years
# 13 AMERICA Q2 FY15 Asked Over the Past Three Years
# 14 AMERICA Q5 FY13 Asked Over the Past Three Years
# 15 AMERICA Q5 FY14 Asked Over the Past Three Years
# 16 AMERICA Q5 FY15 Asked Over the Past Three Years
There's a solution using your ID column. (Using paste0
we can do that somewhat nicer, though with testdf$ID <- paste0(testdf$Region, "_", testdf$QST)
.)
We dcast
your testdf
using the reshape2
package.
library(reshape2)
tmp <- dcast(testdf, ID ~ FY,
value.var="QST", fun.aggregate=length)
Now we already know whether the question was asked in the different years. To answer the further questions, we'll do some maths.
tmp <- cbind(tmp,
past2=as.numeric(t2[3] + t2[4] == 2 & t2[2] == 0),
past3=as.numeric(t2[2] + t2[3] + t2[4] == 3),
y13_15=as.numeric(t2[2] + t2[4] == 2 & t2[3] == 0))
The sequences in the 5:7 columns contain the desired Tally
information that we can milk
tmp$Tally <- apply(tmp, 1, function(x) paste0(x[5:7], collapse=""))
translate into human language by factor levels,
tmp$Tally <- factor(tmp$Tally, labels=c('Question Asked Once Only',
'Question Asked in FY13 & FY15 Only',
'Asked Over The Past Three Years',
'Asked Over The Past Two Years'))
and merge with the original data frame to achieve the desired result.
> merge(testdf, t3[c(1, 8)])
ID FY Region QST Tally
1 AFRICA_Q2 FY13 AFRICA Q2 Asked Over The Past Three Years
2 AFRICA_Q2 FY14 AFRICA Q2 Asked Over The Past Three Years
3 AFRICA_Q2 FY15 AFRICA Q2 Asked Over The Past Three Years
4 AFRICA_Q5 FY14 AFRICA Q5 Asked Over The Past Two Years
5 AFRICA_Q5 FY15 AFRICA Q5 Asked Over The Past Two Years
6 AMERICA_Q2 FY13 AMERICA Q2 Asked Over The Past Three Years
7 AMERICA_Q2 FY14 AMERICA Q2 Asked Over The Past Three Years
8 AMERICA_Q2 FY15 AMERICA Q2 Asked Over The Past Three Years
9 AMERICA_Q5 FY13 AMERICA Q5 Asked Over The Past Three Years
10 AMERICA_Q5 FY14 AMERICA Q5 Asked Over The Past Three Years
11 AMERICA_Q5 FY15 AMERICA Q5 Asked Over The Past Three Years
12 ANTH.CTRY_Q2 FY15 ANTH.CTRY Q2 Question Asked Once Only
13 ASIA_Q2 FY13 ASIA Q2 Asked Over The Past Three Years
14 ASIA_Q2 FY14 ASIA Q2 Asked Over The Past Three Years
15 ASIA_Q2 FY15 ASIA Q2 Asked Over The Past Three Years
16 ASIA_Q5 FY13 ASIA Q5 Question Asked in FY13 & FY15 Only
17 ASIA_Q5 FY15 ASIA Q5 Question Asked in FY13 & FY15 Only
testdf <- structure(list(FY = c("FY13", "FY14", "FY15", "FY14", "FY15",
"FY13", "FY14", "FY15", "FY13", "FY15", "FY13", "FY14", "FY15",
"FY13", "FY14", "FY15", "FY15"), Region = c("AFRICA", "AFRICA",
"AFRICA", "AFRICA", "AFRICA", "ASIA", "ASIA", "ASIA", "ASIA",
"ASIA", "AMERICA", "AMERICA", "AMERICA", "AMERICA", "AMERICA",
"AMERICA", "ANTH.CTRY"), QST = c("Q2", "Q2", "Q2", "Q5", "Q5",
"Q2", "Q2", "Q2", "Q5", "Q5", "Q2", "Q2", "Q2", "Q5", "Q5", "Q5",
"Q2")), row.names = c(NA, 17L), class = "data.frame")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.