简体   繁体   中英

R: How to create dummy variable for multiple values?

I have a dataset with multiple countries and I want to create a dummy variable for continents.

My dataset looks like this at the moment:

+---------------+-----------+-----+-----+-----+
|    Country    |  Period   |  X  |  Y  |  Z  |
+---------------+-----------+-----+-----+-----+
| Argentina     | 1991-1995 | ... | ... | ... |
| Argentina     | 1996-2000 | ... | ... | ... |
| Bolivia       | 1991-1995 | ... | ... | ... |
| Bolivia       | 1996-2000 | ... | ... | ... |
| Brazil        | 1991-1995 | ... | ... | ... |
| Brazil        | 1996-2000 | ... | ... | ... |
| Canada        | 1991-1995 | ... | ... | ... |
| Canada        | 1996-2000 | ... | ... | ... |
| United States | 1991-1995 | ... | ... | ... |
| United States | 1996-2000 | ... | ... | ... |
+---------------+-----------+-----+-----+-----+

My desired output is the following:

+---------------+-----------+-----+-----+-----+---------+---------+
|    Country    |  Period   |  X  |  Y  |  Z  | dummySA | dummyNA |
+---------------+-----------+-----+-----+-----+---------+---------+
| Argentina     | 1991-1995 | ... | ... | ... |       1 |       0 |
| Argentina     | 1996-2000 | ... | ... | ... |       1 |       0 |
| Bolivia       | 1991-1995 | ... | ... | ... |       1 |       0 |
| Bolivia       | 1996-2000 | ... | ... | ... |       1 |       0 |
| Brazil        | 1991-1995 | ... | ... | ... |       1 |       0 |
| Brazil        | 1996-2000 | ... | ... | ... |       1 |       0 |
| Canada        | 1991-1995 | ... | ... | ... |       0 |       1 |
| Canada        | 1996-2000 | ... | ... | ... |       0 |       1 |
| United States | 1991-1995 | ... | ... | ... |       0 |       1 |
| United States | 1996-2000 | ... | ... | ... |       0 |       1 |
+---------------+-----------+-----+-----+-----+---------+---------+

So, I want to have a dummy for all countries in South America and a dummy for all countries in North America. I know how to create a dummy for a single country or year but not for multiple values.

If there are only handful of countrires, create the dummy column with %in%

library(dplyr)
df1 %>%
    mutate(dummySA = as.integer(Country %in% 
        c("Argentina", "Bolivia", "Brazil")), 
        dummyNA = as.integer(!dummySA))

Otherwise, create a key/val dataset with 'Country' and the geographic area, do a merge/join and create the dummy values by spread

library(tidyr)
df1 %>% 
   left_join(keyvaldat) %>%
   mutate(n = 1) %>%
   spread(value, n, fill = 0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM