简体   繁体   English

R循环遍历数据框列中的唯一值以根据条件创建另一个

[英]R Loop over unique values in a dataframe column to create another one based on conditions

My dataset consists of scores and total respondents for questions asked in a survey, over a number of fiscal years (FY13, FY14 & FY15) and in different regions. 我的数据集包括在多个财政年度(2013财年,14财年和15财年)以及不同地区的调查中提出的问题的分数和总受访者。

My objective is to loop through the FY column and identify when each question was asked, for each region. 我的目标是遍历FY列并确定何时针对每个区域提出每个问题。 And store this information in a new column. 并将此信息存储在新列中。

This is what a reproducible sample looks like - 这是可重现的样本的样子-

testdf=data.frame(FY=c("FY13","FY14","FY15","FY14","FY15","FY13","FY14","FY15","FY13","FY15","FY13","FY14","FY15","FY13","FY14","FY15"),
              Region=c(rep("AFRICA",5),rep("ASIA",5),rep("AMERICA",6)),
              QST=c(rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",2),rep("Q2",3),rep("Q5",3)),
              Very.Satisfied=runif(16,min = 0, max=1),
              Total.Very.Satisfied=floor(runif(16,min=10,max=120)),
              Satisfied=runif(16,min = 0, max=1),
              Total.Satisfied=floor(runif(16,min=10,max=120)),
              Dissatisfied=runif(16,min = 0, max=1),
              Total.Dissatisfied=floor(runif(16,min=10,max=120)),
              Very.Dissatisfied=runif(16,min = 0, max=1),
              Total.Very.Dissatisfied=floor(runif(16,min=10,max=120)))

I start with creating an ID column, by concatenating Region & QST 我先通过将RegionQST串联来创建ID列

library(tidyr)
testdf = testdf %>%
unite(ID,c('Region','QST'),sep = "",remove = F)

My Objective 我的目标

1) For each unique ID , identify whether the given question was asked - 1)对于每个唯一ID ,请确定是否提出了以下问题:

a) Only on one year (either FY13, FY14 or FY15) a)仅一年(2013财年,14财年或15财年)

b) Over the Past Two Years (FY15 & FY14 only) b)过去两年(仅2015财年和2014财年)

c) Over the Past Three Years (FY15 & FY14 & FY13) c)过去三年(2015财年,14财年和13财年)

d) On FY13 & FY15 Only d)仅在2013财年和2015财年

My Attempt 我的尝试

For this problem, I tried to create a for loop , and for each unique ID , I first store the unique occurences of each FY the question was asked in a vector v . 对于这个问题,我尝试创建一个for loop ,并针对每个唯一的ID ,首先将在每个向量中出现的唯一问题存储在向量v Then using an IF conditional statement I assign a comment to a newly created column called Tally based on these occurences. 然后,使用IF条件语句,我根据这些情况向新创建的名为Tally的列分配注释。

for (i in unique(testdf$ID))
{
v=unique(testdf$FY)

  if(('FY15' %in% v) & ('FY14' %in% v)) {
      testdf$Tally=='Asked Over The Past Two Years'
  } 
  else if(('FY15' %in% v) & ('FY14' %in% v) & ('FY13' %in% v)) {
       testdf$Tally=='Asked Over The Past Three Years'
  }
  else if(('FY13' %in% v) & ('FY15' %in% v)) {
        testdf$Tally=='Question Asked in FY13 & FY15 Only'
  }
  else { testdf$Tally=='Question Asked Once Only' 
  }

}  

The loop seems to run without throwing an error message, but it doesn't seem to create the new Tally column. 该循环似乎在运行时没有引发错误消息,但是似乎没有创建新的Tally列。

Any help with this will be greatly appreciated. 任何帮助,将不胜感激。

In your code the main problem is that in the if-else clause you're not doing an assignment (using '<-') but a comparison, using '=='. 在您的代码中,主要问题是在if-else子句中,您不是在进行赋值(使用“ <-”),而是在进行比较,使用“ ==”。 Here's a solution that I find more elegant, since it's not using a loop: 我发现这是一个更优雅的解决方案,因为它没有使用循环:

require(tidyverse)

testdf %>%
  select(ID, FY) %>%
  unique() %>%
  mutate(is_true = 1) %>%
  spread(key = FY, value = is_true, fill = 0) %>%
  mutate(tally = case_when(
    FY13 == 1 & FY14 == 1 & FY15 == 1 ~ 'Asked Over The Past Three Years',
                FY14 == 1 & FY15 == 1 ~ 'Asked Over the Past Two Years',
    FY13 == 1 &             FY15 == 1 ~ 'Asked in FY12 & FY15 Only',
    TRUE ~ 'Question Asked Once Only'
  ))

Output: 输出:

+------------------------------------------------------------+
|          ID FY13 FY14 FY15                           tally |
+------------------------------------------------------------+
| 1  AFRICAQ2    1    1    1 Asked Over The Past Three Years |
| 2  AFRICAQ5    0    1    1   Asked Over the Past Two Years |
| 3 AMERICAQ2    1    1    1 Asked Over The Past Three Years |
| 4 AMERICAQ5    1    1    1 Asked Over The Past Three Years |
| 5    ASIAQ2    1    1    1 Asked Over The Past Three Years |
| 6    ASIAQ5    1    0    1       Asked in FY12 & FY15 Only |
+------------------------------------------------------------+

No need for a loop: 无需循环:

library(tidyverse)

result <- testdf %>%
    select(3, 2, 1) %>%
    mutate(Asked = 1) %>%
    spread(FY, Asked)

> result
  QST  Region FY13 FY14 FY15
1  Q2  AFRICA    1    1    1
2  Q2 AMERICA    1    1    1
3  Q2    ASIA    1    1    1
4  Q5  AFRICA   NA    1    1
5  Q5 AMERICA    1    1    1
6  Q5    ASIA    1   NA    1

Answers all four questions in one go. 一口气回答所有四个问题。

If you really want a tally column, expand it like this: 如果您真的想要一个提示栏,请按以下方式展开:

result %>%
    mutate(Tally = case_when(FY13 + FY14 + FY15 == 1 ~ "Only one year",
                             FY13 + FY14 + FY15 == 3 ~ "Past three years",
                             FY14 + FY15 == 2 ~ "Past two years",
                             FY13 + FY15 == 2 ~ "FY13 and FY15 only",
                             NA ~ NA_character_))

  QST  Region FY13 FY14 FY15              Tally
1  Q2  AFRICA    1    1    1   Past three years
2  Q2 AMERICA    1    1    1   Past three years
3  Q2    ASIA    1    1    1   Past three years
4  Q5  AFRICA   NA    1    1     Past two years
5  Q5 AMERICA    1    1    1   Past three years
6  Q5    ASIA    1   NA    1 FY13 and FY15 only

Consider ave for grouping calculation by Region and QST inside nested ifelse for conditional logic: 考虑使用ave在嵌套ifelseRegionQST进行分组计算以获得条件逻辑:

testdf <- within(testdf, {
                   FY13 <- ifelse(FY=='FY13', 1, 0)
                   FY14 <- ifelse(FY=='FY14', 1, 0)
                   FY15 <- ifelse(FY=='FY15', 1, 0)

                   Tally <- ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 1,
                                   'Asked Only on One Year',
                                   ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 3,
                                          'Asked Over the Past Three Years',
                                          ifelse(ave(FY14, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
                                                 'Asked Over the Past Two Years',
                                                 ifelse(ave(FY13, Region, QST, FUN=max) + ave(FY15, Region, QST, FUN=max) == 2,
                                                        'Asked On FY13 & FY15 Only',
                                                        NA
                                                        )
                                                 )
                                          )
                                   )

                   FY13 <- NULL; FY14 <- NULL; FY15 <- NULL
             })

testdf[c("ID", "FY", "Tally")]

#     Region QST   FY                           Tally
# 1   AFRICA  Q2 FY13 Asked Over the Past Three Years
# 2   AFRICA  Q2 FY14 Asked Over the Past Three Years
# 3   AFRICA  Q2 FY15 Asked Over the Past Three Years
# 4   AFRICA  Q5 FY14   Asked Over the Past Two Years
# 5   AFRICA  Q5 FY15   Asked Over the Past Two Years
# 6     ASIA  Q2 FY13 Asked Over the Past Three Years
# 7     ASIA  Q2 FY14 Asked Over the Past Three Years
# 8     ASIA  Q2 FY15 Asked Over the Past Three Years
# 9     ASIA  Q5 FY13       Asked On FY13 & FY15 Only
# 10    ASIA  Q5 FY15       Asked On FY13 & FY15 Only
# 11 AMERICA  Q2 FY13 Asked Over the Past Three Years
# 12 AMERICA  Q2 FY14 Asked Over the Past Three Years
# 13 AMERICA  Q2 FY15 Asked Over the Past Three Years
# 14 AMERICA  Q5 FY13 Asked Over the Past Three Years
# 15 AMERICA  Q5 FY14 Asked Over the Past Three Years
# 16 AMERICA  Q5 FY15 Asked Over the Past Three Years

There's a solution using your ID column. 有使用您的ID列的解决方案。 (Using paste0 we can do that somewhat nicer, though with testdf$ID <- paste0(testdf$Region, "_", testdf$QST) .) (使用paste0我们可以做得更好,尽管使用testdf$ID <- paste0(testdf$Region, "_", testdf$QST) 。)

We dcast your testdf using the reshape2 package. 我们dcasttestdf使用reshape2包。

library(reshape2)
tmp <- dcast(testdf, ID ~ FY, 
               value.var="QST", fun.aggregate=length)

Now we already know whether the question was asked in the different years. 现在我们已经知道在不同年份是否提出过该问题。 To answer the further questions, we'll do some maths. 为了回答其他问题,我们将做一些数学运算。

tmp <- cbind(tmp, 
             past2=as.numeric(t2[3] + t2[4] == 2 & t2[2] == 0), 
             past3=as.numeric(t2[2] + t2[3] + t2[4] == 3),
             y13_15=as.numeric(t2[2] + t2[4] == 2 & t2[3] == 0))

The sequences in the 5:7 columns contain the desired Tally information that we can milk 5:7列中的序列包含我们可以挤奶的所需Tally信息

tmp$Tally <- apply(tmp, 1, function(x) paste0(x[5:7], collapse=""))

translate into human language by factor levels, 按要素水平翻译成人类语言,

tmp$Tally <- factor(tmp$Tally, labels=c('Question Asked Once Only',
                                        'Question Asked in FY13 & FY15 Only',
                                        'Asked Over The Past Three Years',
                                        'Asked Over The Past Two Years'))

and merge with the original data frame to achieve the desired result. 并与原始数据帧合并以获得所需的结果。

Result 结果

> merge(testdf, t3[c(1, 8)])
             ID   FY    Region QST                              Tally
1     AFRICA_Q2 FY13    AFRICA  Q2    Asked Over The Past Three Years
2     AFRICA_Q2 FY14    AFRICA  Q2    Asked Over The Past Three Years
3     AFRICA_Q2 FY15    AFRICA  Q2    Asked Over The Past Three Years
4     AFRICA_Q5 FY14    AFRICA  Q5      Asked Over The Past Two Years
5     AFRICA_Q5 FY15    AFRICA  Q5      Asked Over The Past Two Years
6    AMERICA_Q2 FY13   AMERICA  Q2    Asked Over The Past Three Years
7    AMERICA_Q2 FY14   AMERICA  Q2    Asked Over The Past Three Years
8    AMERICA_Q2 FY15   AMERICA  Q2    Asked Over The Past Three Years
9    AMERICA_Q5 FY13   AMERICA  Q5    Asked Over The Past Three Years
10   AMERICA_Q5 FY14   AMERICA  Q5    Asked Over The Past Three Years
11   AMERICA_Q5 FY15   AMERICA  Q5    Asked Over The Past Three Years
12 ANTH.CTRY_Q2 FY15 ANTH.CTRY  Q2           Question Asked Once Only
13      ASIA_Q2 FY13      ASIA  Q2    Asked Over The Past Three Years
14      ASIA_Q2 FY14      ASIA  Q2    Asked Over The Past Three Years
15      ASIA_Q2 FY15      ASIA  Q2    Asked Over The Past Three Years
16      ASIA_Q5 FY13      ASIA  Q5 Question Asked in FY13 & FY15 Only
17      ASIA_Q5 FY15      ASIA  Q5 Question Asked in FY13 & FY15 Only

Data 数据

testdf <- structure(list(FY = c("FY13", "FY14", "FY15", "FY14", "FY15", 
"FY13", "FY14", "FY15", "FY13", "FY15", "FY13", "FY14", "FY15", 
"FY13", "FY14", "FY15", "FY15"), Region = c("AFRICA", "AFRICA", 
"AFRICA", "AFRICA", "AFRICA", "ASIA", "ASIA", "ASIA", "ASIA", 
"ASIA", "AMERICA", "AMERICA", "AMERICA", "AMERICA", "AMERICA", 
"AMERICA", "ANTH.CTRY"), QST = c("Q2", "Q2", "Q2", "Q5", "Q5", 
"Q2", "Q2", "Q2", "Q5", "Q5", "Q2", "Q2", "Q2", "Q5", "Q5", "Q5", 
"Q2")), row.names = c(NA, 17L), class = "data.frame")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在一个 dataframe 中创建一个列,基于另一个 dataframe 在 R 中的另一列 - Create a column in one dataframe based on another column in another dataframe in R 如何基于一个数据框中的列的值和R中另一个数据框的列标题名称有条件地创建新列 - how to conditionally create new column based on the values of a column in one dataframe and the column header names of another dataframe in R 创建一个包含另一个数据框列中唯一值计数的 R 数据框 - Create an R dataframe containing the counts of unique values in another dataframe column 根据条件更新 dataframe(循环) R - Updating dataframe based on conditions (over loop) R 循环根据条件将值从一个数据框添加到另一个数据框的问题 - Problem with loop to add values from one dataframe and to another based on conditions 如何基于另一列的值聚合一列的R数据帧 - How to aggregate R dataframe of one column based on values of another 根据 R 中的另一列 dataframe 替换一列中的值 - Replace values in one column based on another dataframe in R R:识别一列中的非 NA 值并创建 dataframe 并选择另一列中的值 - R: Identify non-NA values from one column and create dataframe with values from another column based rows selected 根据条件用另一个数据框替换数据框列 - R - Replace Dataframe column with another dataframe based on conditions - R 根据另一个数据帧中的匹配条件将列添加到 R 中的数据帧 - Adding column to a dataframe in R based on matching conditions in another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM