[英]Function to create new variable by multiple conditions using mutate and case_when (R)
[英]R: mutate(indicator variable) using conditions from multiple datasets
我有兩個不同年份(2008 年和 2009 年)的兩個數據集。 這個想法是通過查看它們的 sales_units 和 Dollar_value 來識別新分子。 如果在 2008 年某些分子沒有任何銷售額或美元價值,但在 2009 年該分子具有正銷售額和美元價值,我想將其識別為新分子。 我想生成一個名為 New_Molecule 的指示器變量,當有新分子時取 1,否則取 0,這將是一個很好的方法。
######YEAR 2008 data##########
Year <- c("2008", "2008", "2008", "2008","2008", "2008", "2008", "2008")
Country <- c("US", "US","US", "US", "Canada", "Canada","Canada", "Canada")
Molecule <- c("A", "B", "C", "D","A", "B", "C", "D")
Dollar_Value <- c(0, 0, 100, 200, 75, 0, 0 ,0)
Sales_Units <- c(0, 0, 20, 40, 5, 0, 0, 0)
df_2008 <- data.frame(Year,Country, Molecule, Dollar_Value,Sales_Units)
######YEAR 2009 data##########
Year <- c("2009", "2009", "2009", "2009","2009","2009", "2009", "2009", "2009","2009")
Country <- c("US", "US","US", "US","US", "Canada", "Canada","Canada", "Canada","Canada")
Molecule <- c("A", "B", "C", "D", "E","A", "B", "C", "D", "E")
Dollar_Value <- c(500, 0, 100, 200,0, 75, 0, 0 ,99,0)
Sales_Units <- c(60, 0, 20, 40,0,5, 0, 0, 27,0)
df_2009 <- data.frame(Year, Country, Molecule, Dollar_Value,Sales_Units)
######Want to generate This##########
Year <- c("2009", "2009", "2009", "2009","2009","2009", "2009", "2009", "2009","2009")
Country <- c("US", "US","US", "US","US", "Canada", "Canada","Canada", "Canada","Canada")
Molecule <- c("A", "B", "C", "D", "E","A", "B", "C", "D", "E")
Dollar_Value <- c(500, 0, 100, 200,0, 75, 0, 0 ,99,0)
Sales_Units <- c(60, 0, 20, 40,0,5, 0, 0, 27,0)
New_Molecule <- c(1, 0, 0, 0,0,0,0,0,1,0)
df_2009_NewColumn <- data.frame(Year, Molecule, Dollar_Value,Sales_Units,New_Molecule)
我嘗試了什么:首先我嘗試按年份、國家/地區、分子對數據集進行分組,然后使用 mutate。
df_2008 <- group_by(df_2008,Year,Country,Molecule)
df_2009 <- group_by(df_2009,Year,Country,Molecule)
withnew <- mutate(df_2009, New_Molecule = case_when(df_2008$Dollar_Value ==0 & df_2008$Sales_Units ==0 & df_2009$Dollar_Value >0 & df_2009$Sales_Units >0 ~1,
TRUE~0))
但這給出了一條錯誤消息:
Error: Column `New_Molecule` must be length 1 (the group size), not 10 In addition: Warning message: In df_2008$Dollar_Value == 0 & df_2008$Sales_Units == 0 & df_2009$Dollar_Value >: longer object length is not a multiple of shorter object length
然后我只是嘗試了 mutate 但它沒有生成我需要的指示變量。
如果您使用right_join
將數據組合成寬格式,這會更容易。 這樣,您就可以引用現在位於同一行中的所有變量,以便與ifelse
進行比較:
right_join(df_2008, df_2009,
by = c("Country", "Molecule"),
suffix = c("_2008", "_2009")) %>%
group_by(Country, Molecule) %>%
mutate(New_Molecule = ifelse(Dollar_Value_2008 == 0 &
Sales_Units_2008 == 0 &
Dollar_Value_2009 > 0 &
Sales_Units_2009 > 0, 1, 0)) %>%
ungroup() %>%
transmute(Year = Year_2009, Country = Country, Molecule = Molecule,
Dollar_Value = Dollar_Value_2009, Sales_Units = Sales_Units_2009,
New_Molecule = New_Molecule)
#> # A tibble: 10 x 6
#> Year Country Molecule Dollar_Value Sales_Units New_Molecule
#> <fct> <fct> <chr> <dbl> <dbl> <dbl>
#> 1 2009 US A 500 60 1
#> 2 2009 US B 0 0 0
#> 3 2009 US C 100 20 0
#> 4 2009 US D 200 40 0
#> 5 2009 US E 0 0 0
#> 6 2009 Canada A 75 5 0
#> 7 2009 Canada B 0 0 0
#> 8 2009 Canada C 0 0 0
#> 9 2009 Canada D 99 27 1
#> 10 2009 Canada E 0 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.