简体   繁体   English

R(dplyr / tidyverse)| 使用mutate_at使用if_else语句构造一系列新变量

[英]R (dplyr/tidyverse) | Using mutate_at to construct a series of new variables using if_else statements

I'm relatively new to this site and to the world of programming, so my apologies if this has already been asked. 我对这个站点和编程世界还比较陌生,因此,如果已经提出要求,我深表歉意。

Here's a modified version of a data frame I'm currently working with (truncated to make things easier to diagnose): 这是我当前正在使用的数据框的修改版本(为了使事情更易于诊断而被截断):

  COUNTRY          b_2010 c_2010 b_2011  c_2011   
1 Australia          50     62     67     56     
2 Austria            50     48     48     95      
3 Belgium            50     26     67     25      
4 Bulgaria           50     54     42     64      

Let's assume that I want to create a series of variables indicating that a country has a value equal to or greater than 50 for each existing variable in a given year . 假设我要创建一系列变量,以表示某个国家/地区在给定年份中每个现有变量的值等于或大于50。

I can do so by running something like this: 我可以这样运行:

dataframe %>% mutate(d_2010 = if_else(b_2010 & c_2010 >= 50, "A", "B"),
                     d_2011 = if_else(b_2011 & c_2011 >= 50, "A", "B"))

This should produce the indicator variables I'm looking to construct, but the process will get awfully taxing if I have a lengthy time series. 这应该会生成我要构造的指标变量,但是如果我的时间序列较长,则该过程将非常麻烦。 I'm sure there's a way to go about doing this more efficiently (using mutate_at or some other function), but I haven't been able to figure it out. 我敢肯定有一种方法可以更有效地做到这一点(使用mutate_at或其他函数),但我一直无法弄清楚。

Can someone out there help me out? 外面有人可以帮我吗?

Thanks! 谢谢!

In my eyes for " each existing variable in a given year " would be something like this: 在我看来,“ 给定年份中的每个现有变量 ”都是这样的:

dataframe %>% mutate(d_2010 = if_else(b_2010 >= 50 & c_2010 >= 50, "A", "B"),
                     d_2011 = if_else(b_2011 >= 50 & c_2011 >= 50, "A", "B"))

If this is the goal then I would calculate the new variables in a first step and join them later to the origin dataframe. 如果这是目标,那么我将首先计算新变量,然后再将它们连接到原始数据框。 Something like this: 像这样:

df <- dataframe %>% 
  gather(starts_with("b_"), starts_with("c_"), key = Key , value = Value) %>% 
  mutate(Year = paste0("d_"str_sub(Key, 3, 6))) %>% # creat your new variable 
  group_by(COUNTRY, Year) %>% 
  summarise(d =  ifelse(prod(Value >=50), "A", "B")) %>% # calculate the value
  spread(Year, d)

# join both
dataframe <- dataframe %>% 
  left_join(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM