Here is dput() of a structure I currently have.
structure(list(id = c(1, 1, 2, 4, 4), country = c("USA", "Japan", "Germany", "Germany", "USA"), USA = c(0, 0, 0, 0, 0), Germany = c(0, 0, 0, 0, 0), Japan = c(0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, -5L))
I want to edit this dataframe to get the below results in order to apply this approach to a dataset with 100k+ observations. Specifically, I want to use information from ( df$country
) that describes a country assigned to a particular ID (eg, id == 1
and country == Japan
), and changes the column value with the corresponding column name (eg, a column named "Japan") equal to 1. Note that IDs are not unique!
This is what I'd like to end up with:
structure(list(id = c(1, 1, 2, 4, 4), country = c("USA", "Japan", "Germany", "Germany", "USA"), USA = c(1, 1, 0, 1, 1), Germany = c(0, 0, 1, 1, 1), Japan = c(1, 1, 0, 0, 0)), class = "data.frame", row.names = c(NA, -5L))
The following code gives a close result:
df[levels(factor(df$country))] = model.matrix(~country - 1, df)
But ends up giving me the following, erroneous structure:
structure(list(id = c(1, 1, 2, 4, 4), country = c("USA", "Japan",
"Germany", "Germany", "USA"), USA = c(1, 0, 0, 0, 1), Germany = c(0,
0, 1, 1, 0), Japan = c(0, 1, 0, 0, 0)), row.names = c(NA, -5L
), class = "data.frame")
How can I edit the above command in order to yield my desired result? I cannot use pivot because, in actuality, I'm working with many datasets that have different values in the "country" column that, once pivoted, will yield datasets with non-uniform columns/structures, which will impede data analysis later on.
Thank you for any help!
Perhaps this helps
library(dplyr)
df %>%
mutate(across(USA:Japan, ~ +(country == cur_column()))) %>%
group_by(id) %>%
mutate(across(USA:Japan, max)) %>%
ungroup
-output
# A tibble: 5 × 5
id country USA Germany Japan
<dbl> <chr> <int> <int> <int>
1 1 USA 1 0 1
2 1 Japan 1 0 1
3 2 Germany 0 1 0
4 4 Germany 1 1 0
5 4 USA 1 1 0
Or modifying the model.matrix
as
m1 <- model.matrix(~country - 1, df)
m1[] <- ave(c(m1), df$id[row(m1)], col(m1), FUN = max)
You can use base R
re <- rle(df$id)
for(j in re$values){
y <- which(j == df$id)
df[y , match(df$country[y] , colnames(df))] <- 1
}
id country USA Germany Japan
1 1 USA 1 0 1
2 1 Japan 1 0 1
3 2 Germany 0 1 0
4 4 Germany 1 1 0
5 4 USA 1 1 0
Are you looking for such a solution (in combination) to your closed question here CRAN R - Assign the value '1' to many dummy variables at once
The solution provided by @akrun solves the question here. But you may look for something like this:
library(dplyr)
df %>%
group_by(id) %>%
mutate(across(-country, ~case_when(country == cur_column() ~ 1))) %>%
fill(-country, .direction = "updown") %>%
mutate(across(-country, ~ifelse(is.na(.), 0, .))) %>%
ungroup()
id country USA Germany Japan
<dbl> <chr> <dbl> <dbl> <dbl>
1 1 USA 1 0 1
2 1 Japan 1 0 1
3 2 Germany 0 1 0
4 4 Germany 1 1 0
5 4 USA 1 1 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.