![](/img/trans.png)
[英]Create a new binary variable based on whether previous variable is in a vector (R)
[英]How to Create a New Variable Based on a List on Vector
在R,
和
a) 包含每個 state 所屬地區(東北、南部、中北部、西部)的列表
regions <- list(
west = c("WA", "OR", "CA", "NV", "AZ", "ID", "MT", "WY",
"CO", "NM", "UT"),
south = c("TX", "OK", "AR", "LA", "MS", "AL", "TN", "KY",
"GA", "FL", "SC", "NC", "VA", "WV"),
midwest = c("KS", "NE", "SD", "ND", "MN", "MO", "IA", "IL",
"IN", "MI", "WI", "OH"),
northeast = c("ME", "NH", "NY", "MA", "RI", "VT", "PA",
"NJ", "CT", "DE", "MD", "DC")
)
b) 一個包含狀態和死亡的數據框
#A tibble:
state Deaths
<chr> <int>
1 AL 29549
2 AK 741
3 AR 50127
4 NJ 15142
5 CA 175213
6 IA 1647
...
我想創建一個新變量,將每個 state 與其區域匹配並總結死亡人數。 執行此操作的最佳方法是什么?
我們可以將list
stack
到兩列 data.frame 並進行連接
library(dplyr)
stack(regions) %>%
left_join(df1, ., by = c("state" = "values")) %>%
rename(region = 'ind')
-輸出
state Deaths region
1 AL 29549 south
2 AK 741 <NA>
3 AR 50127 south
4 NJ 15142 northeast
5 CA 175213 west
6 IA 1647 midwest
如果df1
有重復的行,我們可以通過summarise
進行分組
stack(regions) %>%
left_join(df1, ., by = c("state" = "values")) %>%
group_by(state, region = 'ind') %>%
summarise(Deaths = sum(Deaths, na.rm = TRUE), .groups = 'drop')
df1 <- structure(list(state = c("AL", "AK", "AR", "NJ", "CA", "IA"),
Deaths = c(29549L, 741L, 50127L, 15142L, 175213L, 1647L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.