简体   繁体   English

R 使用循环、ifelse 条件和子集 (dplyr) 将变量变异为来自另一个观察的变量值

[英]R mutate variable to variable values from another observation, using a loop, an ifelse condition and subset (dplyr)

see my reproducible and desired output below.请参阅下面的可重现和所需的 output。

I want to create a new variable, where I combine variable values from other observations (rows), which I want to identify in a loop using subset.我想创建一个新变量,在其中组合来自其他观察(行)的变量值,我想使用子集在循环中识别这些值。 The condition of the subset is to be defined by the loop.子集的条件由循环定义。 In example 1 subset(df, country == i) does not work, but doing it manually (in Ex.2) subset(df, country == 'US') works.在示例 1 subset(df, country == i)不起作用,但手动(在 Ex.2 中) subset(df, country == 'US')有效。 I thought country == i and country == 'US' should be pretty much the same.我认为country == icountry == 'US'应该几乎相同。

# create a df
country <- c('US', 'US', 'China', 'China')
Trump_virus <- c('Y', 'N' ,'Y', 'N')
cases <- c (1000, 2000, 4, 6)
df <- data.frame(country, Trump_virus, cases)
#################################################### Ex.1
for (i in df$country) {
 print(i)
 df <- df %>%
  mutate(cases_corected = ifelse(
   Trump_virus == 'Y'
   ,subset(df, Trump_virus == 'N' & country ==  i)$cases*1000
   ,'killer_virus'
  ))}
##
df$cases_corected
#################################################### Ex.2
    for (i in df$country) {
 print(i)
 df <- df %>%
  mutate(cases_corected = ifelse(
   Trump_virus == 'Y'
   ,subset(df, Trump_virus == 'N' & country ==  'US')$cases*1000
   ,'killer_virus'
  ))}
##
df$cases_corected
################################################### Desired output
> df$cases_corected
[1] "2e+06"       
[2] "killer_virus"
[3] "6000"        
[4] "killer_virus"

Here is a solution with dplyr .这是dplyr的解决方案。 Updated based on the change in desired output根据所需 output 中的更改进行更新

df <- df %>%
  mutate(country=toupper(country)) # to get same names for other variants of a country  #e.g. China and china

#genearting a dataset which have cases only for Trump_virus==N
df1<-df %>% 
  dplyr::filter(Trump_virus=="N") %>% 
  dplyr::mutate(ID= "Y",
                cases_corected=cases*1e3) %>%
  dplyr::select(-c(cases,Trump_virus))

# final merging
df<-df %>% 
  left_join(df1,by=c("country"="country","Trump_virus"="ID")) %>%
  mutate(cases_corected=ifelse(is.na(cases_corected),'killer_virus',cases_corected))

df

  country Trump_virus cases cases_corected
1      US           Y  1000          2e+06
2      US           N  2000   killer_virus
3   CHINA           Y     4           6000
4   CHINA           N     6   killer_virus

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM