[英]Create a column in one dataframe based on another column in another dataframe in R
[英]R Loop over unique values in a dataframe column to create another one based on conditions
我的數據集包括在多個財政年度(2013財年,14財年和15財年)以及不同地區的調查中提出的問題的分數和總受訪者。
我的目標是遍歷FY
列並確定何時針對每個區域提出每個問題。 並將此信息存儲在新列中。
這是可重現的樣本的樣子-
testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"),
Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)),
QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)),
Very.Satisfied=runif(16,min = 0, max=1),
Total.Very.Satisfied=floor(runif(16,min=10,max=120)),
Satisfied=runif(16,min = 0, max=1),
Total.Satisfied=floor(runif(16,min=10,max=120)),
Dissatisfied=runif(16,min = 0, max=1),
Total.Dissatisfied=floor(runif(16,min=10,max=120)),
Very.Dissatisfied=runif(16,min = 0, max=1),
Total.Very.Dissatisfied=floor(runif(16,min=10,max=120)))
我先通過將Region
和QST
串聯來創建ID列
library(tidyr)
testdf = testdf %>%
unite(ID,c('Region','QST'),sep = "",remove = F)
我的目標
1)對於每個唯一ID
,請確定是否提出了以下問題:
a)僅一年(2013財年,14財年或15財年)
b)過去兩年(僅2015財年和2014財年)
c)過去三年(2015財年,14財年和13財年)
d)僅在2013財年和2015財年
我的嘗試
對於這個問題,我嘗試創建一個for loop
,並針對每個唯一的ID
,首先將在每個向量中出現的唯一問題存儲在向量v
。 然后,使用IF條件語句,我根據這些情況向新創建的名為Tally
的列分配注釋。
for (i in unique(testdf$ID))
{
v=unique(testdf$FY)
if(('FY15' %in% v) & ('FY14' %in% v)) {
testdf$Tally=='Asked Over The Past Two Years'
}
else if(('FY15' %in% v) & ('FY14' %in% v) & ('FY13' %in% v)) {
testdf$Tally=='Asked Over The Past Three Years'
}
else if(('FY13' %in% v) & ('FY15' %in% v)) {
testdf$Tally=='Question Asked in FY13 & FY15 Only'
}
else { testdf$Tally=='Question Asked Once Only'
}
}
該循環似乎在運行時沒有引發錯誤消息,但是似乎沒有創建新的Tally
列。
任何幫助,將不勝感激。
在您的代碼中,主要問題是在if-else子句中,您不是在進行賦值(使用“ <-”),而是在進行比較,使用“ ==”。 我發現這是一個更優雅的解決方案,因為它沒有使用循環:
require(tidyverse)
testdf %>%
select(ID, FY) %>%
unique() %>%
mutate(is_true = 1) %>%
spread(key = FY, value = is_true, fill = 0) %>%
mutate(tally = case_when(
FY13 == 1 & FY14 == 1 & FY15 == 1 ~ 'Asked Over The Past Three Years',
FY14 == 1 & FY15 == 1 ~ 'Asked Over the Past Two Years',
FY13 == 1 & FY15 == 1 ~ 'Asked in FY12 & FY15 Only',
TRUE ~ 'Question Asked Once Only'
))
輸出:
+------------------------------------------------------------+
| ID FY13 FY14 FY15 tally |
+------------------------------------------------------------+
| 1 AFRICAQ2 1 1 1 Asked Over The Past Three Years |
| 2 AFRICAQ5 0 1 1 Asked Over the Past Two Years |
| 3 AMERICAQ2 1 1 1 Asked Over The Past Three Years |
| 4 AMERICAQ5 1 1 1 Asked Over The Past Three Years |
| 5 ASIAQ2 1 1 1 Asked Over The Past Three Years |
| 6 ASIAQ5 1 0 1 Asked in FY12 & FY15 Only |
+------------------------------------------------------------+
無需循環:
library(tidyverse)
result <- testdf %>%
select(3, 2, 1) %>%
mutate(Asked = 1) %>%
spread(FY, Asked)
> result
QST Region FY13 FY14 FY15
1 Q2 AFRICA 1 1 1
2 Q2 AMERICA 1 1 1
3 Q2 ASIA 1 1 1
4 Q5 AFRICA NA 1 1
5 Q5 AMERICA 1 1 1
6 Q5 ASIA 1 NA 1
一口氣回答所有四個問題。
如果您真的想要一個提示欄,請按以下方式展開:
result %>%
mutate(Tally = case_when(FY13 + FY14 + FY15 == 1 ~ "Only one year",
FY13 + FY14 + FY15 == 3 ~ "Past three years",
FY14 + FY15 == 2 ~ "Past two years",
FY13 + FY15 == 2 ~ "FY13 and FY15 only",
NA ~ NA_character_))
QST Region FY13 FY14 FY15 Tally
1 Q2 AFRICA 1 1 1 Past three years
2 Q2 AMERICA 1 1 1 Past three years
3 Q2 ASIA 1 1 1 Past three years
4 Q5 AFRICA NA 1 1 Past two years
5 Q5 AMERICA 1 1 1 Past three years
6 Q5 ASIA 1 NA 1 FY13 and FY15 only
考慮使用ave
在嵌套ifelse
按Region和QST進行分組計算以獲得條件邏輯:
testdf <- within(testdf, {
FY13 <- ifelse(FY=='FY13', 1, 0)
FY14 <- ifelse(FY=='FY14', 1, 0)
FY15 <- ifelse(FY=='FY15', 1, 0)
Tally <- ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 1,
'Asked Only on One Year',
ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 3,
'Asked Over the Past Three Years',
ifelse(ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
'Asked Over the Past Two Years',
ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
'Asked On FY13 & FY15 Only',
NA
)
)
)
)
FY13 <- NULL; FY14 <- NULL; FY15 <- NULL
})
testdf[c("ID", "FY", "Tally")]
# Region QST FY Tally
# 1 AFRICA Q2 FY13 Asked Over the Past Three Years
# 2 AFRICA Q2 FY14 Asked Over the Past Three Years
# 3 AFRICA Q2 FY15 Asked Over the Past Three Years
# 4 AFRICA Q5 FY14 Asked Over the Past Two Years
# 5 AFRICA Q5 FY15 Asked Over the Past Two Years
# 6 ASIA Q2 FY13 Asked Over the Past Three Years
# 7 ASIA Q2 FY14 Asked Over the Past Three Years
# 8 ASIA Q2 FY15 Asked Over the Past Three Years
# 9 ASIA Q5 FY13 Asked On FY13 & FY15 Only
# 10 ASIA Q5 FY15 Asked On FY13 & FY15 Only
# 11 AMERICA Q2 FY13 Asked Over the Past Three Years
# 12 AMERICA Q2 FY14 Asked Over the Past Three Years
# 13 AMERICA Q2 FY15 Asked Over the Past Three Years
# 14 AMERICA Q5 FY13 Asked Over the Past Three Years
# 15 AMERICA Q5 FY14 Asked Over the Past Three Years
# 16 AMERICA Q5 FY15 Asked Over the Past Three Years
有使用您的ID列的解決方案。 (使用paste0
我們可以做得更好,盡管使用testdf$ID <- paste0(testdf$Region, "_", testdf$QST)
。)
我們dcast
您testdf
使用reshape2
包。
library(reshape2)
tmp <- dcast(testdf, ID ~ FY,
value.var="QST", fun.aggregate=length)
現在我們已經知道在不同年份是否提出過該問題。 為了回答其他問題,我們將做一些數學運算。
tmp <- cbind(tmp,
past2=as.numeric(t2[3] + t2[4] == 2 & t2[2] == 0),
past3=as.numeric(t2[2] + t2[3] + t2[4] == 3),
y13_15=as.numeric(t2[2] + t2[4] == 2 & t2[3] == 0))
5:7列中的序列包含我們可以擠奶的所需Tally
信息
tmp$Tally <- apply(tmp, 1, function(x) paste0(x[5:7], collapse=""))
按要素水平翻譯成人類語言,
tmp$Tally <- factor(tmp$Tally, labels=c('Question Asked Once Only',
'Question Asked in FY13 & FY15 Only',
'Asked Over The Past Three Years',
'Asked Over The Past Two Years'))
並與原始數據幀合並以獲得所需的結果。
> merge(testdf, t3[c(1, 8)])
ID FY Region QST Tally
1 AFRICA_Q2 FY13 AFRICA Q2 Asked Over The Past Three Years
2 AFRICA_Q2 FY14 AFRICA Q2 Asked Over The Past Three Years
3 AFRICA_Q2 FY15 AFRICA Q2 Asked Over The Past Three Years
4 AFRICA_Q5 FY14 AFRICA Q5 Asked Over The Past Two Years
5 AFRICA_Q5 FY15 AFRICA Q5 Asked Over The Past Two Years
6 AMERICA_Q2 FY13 AMERICA Q2 Asked Over The Past Three Years
7 AMERICA_Q2 FY14 AMERICA Q2 Asked Over The Past Three Years
8 AMERICA_Q2 FY15 AMERICA Q2 Asked Over The Past Three Years
9 AMERICA_Q5 FY13 AMERICA Q5 Asked Over The Past Three Years
10 AMERICA_Q5 FY14 AMERICA Q5 Asked Over The Past Three Years
11 AMERICA_Q5 FY15 AMERICA Q5 Asked Over The Past Three Years
12 ANTH.CTRY_Q2 FY15 ANTH.CTRY Q2 Question Asked Once Only
13 ASIA_Q2 FY13 ASIA Q2 Asked Over The Past Three Years
14 ASIA_Q2 FY14 ASIA Q2 Asked Over The Past Three Years
15 ASIA_Q2 FY15 ASIA Q2 Asked Over The Past Three Years
16 ASIA_Q5 FY13 ASIA Q5 Question Asked in FY13 & FY15 Only
17 ASIA_Q5 FY15 ASIA Q5 Question Asked in FY13 & FY15 Only
testdf <- structure(list(FY = c("FY13", "FY14", "FY15", "FY14", "FY15",
"FY13", "FY14", "FY15", "FY13", "FY15", "FY13", "FY14", "FY15",
"FY13", "FY14", "FY15", "FY15"), Region = c("AFRICA", "AFRICA",
"AFRICA", "AFRICA", "AFRICA", "ASIA", "ASIA", "ASIA", "ASIA",
"ASIA", "AMERICA", "AMERICA", "AMERICA", "AMERICA", "AMERICA",
"AMERICA", "ANTH.CTRY"), QST = c("Q2", "Q2", "Q2", "Q5", "Q5",
"Q2", "Q2", "Q2", "Q5", "Q5", "Q2", "Q2", "Q2", "Q5", "Q5", "Q5",
"Q2")), row.names = c(NA, 17L), class = "data.frame")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.