简体   繁体   中英

Adding new, combined values to existing dataframe in R

This is an approximation of the original dataframe. In the original, there are many more columns than are shown here.

id  init_cont  family  description  value
1   K          S       impacteach   1
1   K          S       impactover   3
1   K          S       read         2
2   I          S       impacteach   2
2   I          S       impactover   4
2   I          S       read         1
3   K          D       impacteach   3
3   K          D       impactover   5
3   K          D       read         3

I want to combine the values for impacteach and impactover to generate an average value that is just called impact. I would like the final table to look like the following:

id  init_cont  family  description  value
1   K          S       impact       2
1   K          S       read         2
2   I          S       impact       3
2   I          S       read         1
3   K          D       impact       4
3   K          D       read         3

I have not been able to figure out how to generate this table. However, I have been able to create a dataframe that looks like this:

id  description  value
1   impact       2
1   read         2
2   impact       3
2   read         1
3   impact       4
3   read         3

What is the best way for me to take these new values and add them to the original dataframe? I also need to remove the original values (like impacteach and impactover) in the original dataframe. I would prefer to modify the original dataframe as opposed to creating an entirely new dataframe because the original dataframe has many columns.

In case it is useful, this is a summary of the code I used to create the shorter dataframe with impact as a combination of impacteach and impactover:

df %<%
  mutate(newdescription = case_when(description %in% c("impacteach", "impactoverall") ~ "impact", TRUE ~ description)) %<% 
  group_by(id, newdescription) %<%
  summarise(value = mean(as.numeric(value)))

What if you changed the description column first so that it could be included in the grouping:

df %>% 
    mutate(description = substr(description, 1, 6)) %>%
    group_by(id, init_cont, family, description) %>% 
    summarise(value = mean(value))

# A tibble: 6 x 5
# Groups:   id, init_cont, family [?]
#      id init_cont family description value
#   <int> <chr>     <chr>  <chr>       <dbl>
# 1     1 K         S      impact         2.
# 2     1 K         S      read           2.
# 3     2 I         S      impact         3.
# 4     2 I         S      read           1.
# 5     3 K         D      impact         4.
# 6     3 K         D      read           3.

You just need to modify your group_by statement. Try group_by(id, init_cont, family)

Because your id seems to be mapped to init_cont and family already, adding in these values won't change your summarization result. Then you have all the columns you want with no extra work.

If you have a lot of columns you could trying something like the code below. Essentially, do a left_join onto your original data with your summarised data, but doing it using the . to not store off a new dataframe. Then, once joined (by id and description which we modified in place) you'll have two value columns which should be prepeneded with a .x and .y, drop the original and then use distinct to get rid of the duplicate 'impact' columns.

df %>% 
  mutate(description = case_when(description %in% c("impacteach", "impactoverall") ~ "impact", TRUE ~ description)) %>%
  left_join(. %>%
              group_by(id, description)
              summarise(value = mean(as.numeric(value))
            ,by=c('id','description')) %>%
  select(-value.x) %>%
  distinct()

gsub can be used to replace description containing imact as impact and then group_by from dplyr package will help in summarising the value.

df %>% group_by(id, init_cont, family, 
        description = gsub("^(impact).*","\\1", description)) %>%
  summarise(value = mean(value))

# # A tibble: 6 x 5
# # Groups: id, init_cont, family [?]
#      id init_cont family description value
#   <int> <chr>     <chr>  <chr>       <dbl>
# 1     1 K         S      impact       2.00
# 2     1 K         S      read         2.00
# 3     2 I         S      impact       3.00
# 4     2 I         S      read         1.00
# 5     3 K         D      impact       4.00
# 6     3 K         D      read         3.00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM