简体   繁体   中英

Collapsing dummy columns in R

I have a tibble in which each row corresponds to a person. There are multiple rows per person, but each row contains the exact same data for each person, EXCEPT for the final several columns (below, "won", "lost") which contain 1/0 dummy variables. The values of the dummies vary across the rows.

Example dataframe:

df <- data.frame(name = c("Anne", "Anne", "Anne", "Joe", "Joe", "Joe", "Kyle", "Kyle", "Kyle", "Tom", "Tom", "Tom"), age = c("13", "13", "13", "15", "15", "15", "12", "12", "12", "14", "14", "14"), won = c(1,0,0,0,0,1,0,1,0,0,0,0), lost = c(0,1,0,0,1,0,1,0,0,0,1,0))

I would like to collapse the rows such that there is only one row for each person. In my collapsed dataframe, I would like the values of "won" and "lost" (the dummy columns) to be "1" for a person if that person had ANY "1"s in that column in the original dataset. Otherwise, I would like the value to be "0."

Collapsed dataframe:

df_collapsed <- data.frame(name = c("Anne", "Joe", "Kyle", "Tom"), age = c("13","15","12","14"), won = c(1,1,1,0), lost = c(1,0,1,1))

Please let me know if you have any ideas. I can't do this manually (as in the example) because my actual dataset is much larger. I have been thinking through this problem for some time but am unable to figure out how to collapse the dataframe accordingly.

We may use max after grouping

library(dplyr)
df %>%
   group_by(name, age) %>% 
   summarise(across(everything(), max), .groups = 'drop')

Or in base R

aggregate(. ~ name + age, df, max)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM