繁体   English   中英

R循环遍历数据框列中的唯一值以根据条件创建另一个

[英]R Loop over unique values in a dataframe column to create another one based on conditions

我的数据集包括在多个财政年度(2013财年,14财年和15财年)以及不同地区的调查中提出的问题的分数和总受访者。

我的目标是遍历FY列并确定何时针对每个区域提出每个问题。 并将此信息存储在新列中。

这是可重现的样本的样子-

testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"),
              Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)),
              QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)),
              Very.Satisfied=runif(16,min = 0, max=1),
              Total.Very.Satisfied=floor(runif(16,min=10,max=120)),
              Satisfied=runif(16,min = 0, max=1),
              Total.Satisfied=floor(runif(16,min=10,max=120)),
              Dissatisfied=runif(16,min = 0, max=1),
              Total.Dissatisfied=floor(runif(16,min=10,max=120)),
              Very.Dissatisfied=runif(16,min = 0, max=1),
              Total.Very.Dissatisfied=floor(runif(16,min=10,max=120)))

我先通过将RegionQST串联来创建ID列

library(tidyr)
testdf = testdf %>%
unite(ID,c('Region','QST'),sep = "",remove = F)

我的目标

1)对于每个唯一ID ,请确定是否提出了以下问题:

a)仅一年(2013财年,14财年或15财年)

b)过去两年(仅2015财年和2014财年)

c)过去三年(2015财年,14财年和13财年)

d)仅在2013财年和2015财年

我的尝试

对于这个问题,我尝试创建一个for loop ,并针对每个唯一的ID ,首先将在每个向量中出现的唯一问题存储在向量v 然后,使用IF条件语句,我根据这些情况向新创建的名为Tally的列分配注释。

for (i in unique(testdf$ID))
{
v=unique(testdf$FY)

  if(('FY15' %in% v) & ('FY14' %in% v)) {
      testdf$Tally=='Asked Over The Past Two Years'
  } 
  else if(('FY15' %in% v) & ('FY14' %in% v) & ('FY13' %in% v)) {
       testdf$Tally=='Asked Over The Past Three Years'
  }
  else if(('FY13' %in% v) & ('FY15' %in% v)) {
        testdf$Tally=='Question Asked in FY13 & FY15 Only'
  }
  else { testdf$Tally=='Question Asked Once Only' 
  }

}  

该循环似乎在运行时没有引发错误消息,但是似乎没有创建新的Tally列。

任何帮助,将不胜感激。

在您的代码中,主要问题是在if-else子句中,您不是在进行赋值(使用“ <-”),而是在进行比较,使用“ ==”。 我发现这是一个更优雅的解决方案,因为它没有使用循环:

require(tidyverse)

testdf %>%
  select(ID, FY) %>%
  unique() %>%
  mutate(is_true = 1) %>%
  spread(key = FY, value = is_true, fill = 0) %>%
  mutate(tally = case_when(
    FY13 == 1 & FY14 == 1 & FY15 == 1 ~ 'Asked Over The Past Three Years',
                FY14 == 1 & FY15 == 1 ~ 'Asked Over the Past Two Years',
    FY13 == 1 &             FY15 == 1 ~ 'Asked in FY12 & FY15 Only',
    TRUE ~ 'Question Asked Once Only'
  ))

输出:

+------------------------------------------------------------+
|          ID FY13 FY14 FY15                           tally |
+------------------------------------------------------------+
| 1  AFRICAQ2    1    1    1 Asked Over The Past Three Years |
| 2  AFRICAQ5    0    1    1   Asked Over the Past Two Years |
| 3 AMERICAQ2    1    1    1 Asked Over The Past Three Years |
| 4 AMERICAQ5    1    1    1 Asked Over The Past Three Years |
| 5    ASIAQ2    1    1    1 Asked Over The Past Three Years |
| 6    ASIAQ5    1    0    1       Asked in FY12 & FY15 Only |
+------------------------------------------------------------+

无需循环:

library(tidyverse)

result <- testdf %>%
    select(3, 2, 1) %>%
    mutate(Asked = 1) %>%
    spread(FY, Asked)

> result
  QST  Region FY13 FY14 FY15
1  Q2  AFRICA    1    1    1
2  Q2 AMERICA    1    1    1
3  Q2    ASIA    1    1    1
4  Q5  AFRICA   NA    1    1
5  Q5 AMERICA    1    1    1
6  Q5    ASIA    1   NA    1

一口气回答所有四个问题。

如果您真的想要一个提示栏,请按以下方式展开:

result %>%
    mutate(Tally = case_when(FY13 + FY14 + FY15 == 1 ~ "Only one year",
                             FY13 + FY14 + FY15 == 3 ~ "Past three years",
                             FY14 + FY15 == 2 ~ "Past two years",
                             FY13 + FY15 == 2 ~ "FY13 and FY15 only",
                             NA ~ NA_character_))

  QST  Region FY13 FY14 FY15              Tally
1  Q2  AFRICA    1    1    1   Past three years
2  Q2 AMERICA    1    1    1   Past three years
3  Q2    ASIA    1    1    1   Past three years
4  Q5  AFRICA   NA    1    1     Past two years
5  Q5 AMERICA    1    1    1   Past three years
6  Q5    ASIA    1   NA    1 FY13 and FY15 only

考虑使用ave在嵌套ifelseRegionQST进行分组计算以获得条件逻辑:

testdf <- within(testdf, {
                   FY13 <- ifelse(FY=='FY13', 1, 0)
                   FY14 <- ifelse(FY=='FY14', 1, 0)
                   FY15 <- ifelse(FY=='FY15', 1, 0)

                   Tally <- ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 1,
                                   'Asked Only on One Year',
                                   ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 3,
                                          'Asked Over the Past Three Years',
                                          ifelse(ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
                                                 'Asked Over the Past Two Years',
                                                 ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
                                                        'Asked On FY13 & FY15 Only',
                                                        NA
                                                        )
                                                 )
                                          )
                                   )

                   FY13 <- NULL; FY14 <- NULL; FY15 <- NULL
             })

testdf[c("ID", "FY", "Tally")]

#     Region QST   FY                           Tally
# 1   AFRICA  Q2 FY13 Asked Over the Past Three Years
# 2   AFRICA  Q2 FY14 Asked Over the Past Three Years
# 3   AFRICA  Q2 FY15 Asked Over the Past Three Years
# 4   AFRICA  Q5 FY14   Asked Over the Past Two Years
# 5   AFRICA  Q5 FY15   Asked Over the Past Two Years
# 6     ASIA  Q2 FY13 Asked Over the Past Three Years
# 7     ASIA  Q2 FY14 Asked Over the Past Three Years
# 8     ASIA  Q2 FY15 Asked Over the Past Three Years
# 9     ASIA  Q5 FY13       Asked On FY13 & FY15 Only
# 10    ASIA  Q5 FY15       Asked On FY13 & FY15 Only
# 11 AMERICA  Q2 FY13 Asked Over the Past Three Years
# 12 AMERICA  Q2 FY14 Asked Over the Past Three Years
# 13 AMERICA  Q2 FY15 Asked Over the Past Three Years
# 14 AMERICA  Q5 FY13 Asked Over the Past Three Years
# 15 AMERICA  Q5 FY14 Asked Over the Past Three Years
# 16 AMERICA  Q5 FY15 Asked Over the Past Three Years

有使用您的ID列的解决方案。 (使用paste0我们可以做得更好,尽管使用testdf$ID <- paste0(testdf$Region, "_", testdf$QST) 。)

我们dcasttestdf使用reshape2包。

library(reshape2)
tmp <- dcast(testdf, ID ~ FY, 
               value.var="QST", fun.aggregate=length)

现在我们已经知道在不同年份是否提出过该问题。 为了回答其他问题,我们将做一些数学运算。

tmp <- cbind(tmp, 
             past2=as.numeric(t2[3] + t2[4] == 2 & t2[2] == 0), 
             past3=as.numeric(t2[2] + t2[3] + t2[4] == 3),
             y13_15=as.numeric(t2[2] + t2[4] == 2 & t2[3] == 0))

5:7列中的序列包含我们可以挤奶的所需Tally信息

tmp$Tally <- apply(tmp, 1, function(x) paste0(x[5:7], collapse=""))

按要素水平翻译成人类语言,

tmp$Tally <- factor(tmp$Tally, labels=c('Question Asked Once Only',
                                        'Question Asked in FY13 & FY15 Only',
                                        'Asked Over The Past Three Years',
                                        'Asked Over The Past Two Years'))

并与原始数据帧合并以获得所需的结果。

结果

> merge(testdf, t3[c(1, 8)])
             ID   FY    Region QST                              Tally
1     AFRICA_Q2 FY13    AFRICA  Q2    Asked Over The Past Three Years
2     AFRICA_Q2 FY14    AFRICA  Q2    Asked Over The Past Three Years
3     AFRICA_Q2 FY15    AFRICA  Q2    Asked Over The Past Three Years
4     AFRICA_Q5 FY14    AFRICA  Q5      Asked Over The Past Two Years
5     AFRICA_Q5 FY15    AFRICA  Q5      Asked Over The Past Two Years
6    AMERICA_Q2 FY13   AMERICA  Q2    Asked Over The Past Three Years
7    AMERICA_Q2 FY14   AMERICA  Q2    Asked Over The Past Three Years
8    AMERICA_Q2 FY15   AMERICA  Q2    Asked Over The Past Three Years
9    AMERICA_Q5 FY13   AMERICA  Q5    Asked Over The Past Three Years
10   AMERICA_Q5 FY14   AMERICA  Q5    Asked Over The Past Three Years
11   AMERICA_Q5 FY15   AMERICA  Q5    Asked Over The Past Three Years
12 ANTH.CTRY_Q2 FY15 ANTH.CTRY  Q2           Question Asked Once Only
13      ASIA_Q2 FY13      ASIA  Q2    Asked Over The Past Three Years
14      ASIA_Q2 FY14      ASIA  Q2    Asked Over The Past Three Years
15      ASIA_Q2 FY15      ASIA  Q2    Asked Over The Past Three Years
16      ASIA_Q5 FY13      ASIA  Q5 Question Asked in FY13 & FY15 Only
17      ASIA_Q5 FY15      ASIA  Q5 Question Asked in FY13 & FY15 Only

数据

testdf <- structure(list(FY = c("FY13", "FY14", "FY15", "FY14", "FY15", 
"FY13", "FY14", "FY15", "FY13", "FY15", "FY13", "FY14", "FY15", 
"FY13", "FY14", "FY15", "FY15"), Region = c("AFRICA", "AFRICA", 
"AFRICA", "AFRICA", "AFRICA", "ASIA", "ASIA", "ASIA", "ASIA", 
"ASIA", "AMERICA", "AMERICA", "AMERICA", "AMERICA", "AMERICA", 
"AMERICA", "ANTH.CTRY"), QST = c("Q2", "Q2", "Q2", "Q5", "Q5", 
"Q2", "Q2", "Q2", "Q5", "Q5", "Q2", "Q2", "Q2", "Q5", "Q5", "Q5", 
"Q2")), row.names = c(NA, 17L), class = "data.frame")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM