简体   繁体   中英

R tidyverse manipulating dataframe

data:

name_id     name_desc   is_mand   count
howard101   howards id        1   123
howard101   howards id        0     4
rando12     random pers       1   500
peter54     peters name       1    10
peter54     peters name       0    14
danny66     dannys acc        0    20

I have data as shown above, a name_id can be mandatory (1) or not(0). If a name_id has a mandatory and non-mandatory column i want to sum the count and just label it mandatory (is_mand = 1). How can I do this?

intended output:

name_id     name_desc   is_mand   count
howard101   howards id        1   127
rando12     random pers       1   500
peter54     peters name       1    24
danny66     dannys acc        0    20

I have data as shown

I'm thinking I can group by the name_id and when the count is greater than 2 just label it as mandatory and sum the count?

Are you trying to summarise a count according to mandatory and non mandatory values for each name_id?

If so you would use the summarise function:

df_summary <- df %>% group_by(name_id, name_description, is_mand) %>% summarise(count = sum(name_id, na.rm = TRUE)

Or if you just want to filter by is_mand you could use:

df_filtered <- df[df$is_mand == 1,]

You could also combine the two operations with the filter function:

df_summary <- df %>% group_by(name_id, name_description, is_mand) %>% summarise(count = sum(name_id, na.rm = TRUE) %>% filter(is_mand == 1)

Is that roughly what you were asking for?

this is accomplished with group_by() and summarise() from dplyr:

df %>%
  group_by(name_id, name_desc) %>%
  summarise(is_mand = sum(is_mand),
            count = sum(count))

  name_id   name_desc   is_mand count
  <chr>     <chr>         <dbl> <dbl>
1 danny66   dannys acc        0    20
2 howard101 howards id        1   127
3 peter54   peters name       1    24
4 rando12   random pers       1   500

Another option using an ifelse() statement to match name_id where any is_mand is equal to 1.

df %>%
  group_by(name_id, name_desc) %>%
  summarize(is_mand = ifelse(any(is_mand == 1), 1, 0),
            count = ifelse(any(is_mand == 1), sum(count), count))

Data

df <- structure(list(name_id = c("howard101", "howard101", "rando12", 
"peter54", "peter54", "danny66"), name_desc = c("howards id", 
"howards id", "random pers", "peters name", "peters name", "dannys acc"
), is_mand = c(1, 0, 1, 1, 0, 0), count = c(123, 4, 500, 10, 
14, 20)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", 
"data.frame"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM