简体   繁体   中英

Create a new column based on the sum of values from another column, with dplyr

I'd like to create a new variable called POPULATION that takes up the sum of the values of the variable P1 grouped by the variable CODASC. It seemed easy to me at the beginning, but I'm eventually struggling. Since I have to do this for a lot of variables and for several datasets, I really need a quick way of doing it! If anyone can help me, I would really appreciate it! Many thanks, Ilaria

My data frame looks like that:

PROCOM      SEZ2011   SEZ CODASC    P1   P47   P62  P131    E1    E3  ST15   A46
   <int>        <dbl> <int>  <int> <int> <int> <int> <int> <int> <int> <int> <int>
1  48017 480000000000 60001      4   251    25     9    20    70    40    19    20
2  48017 480000000000 60002      3    15     1     0     1     4     4     0     3
3  48017 480000000000 60003      2    20     7     2     1     1     1     1     1
4  48017 480000000000 60004      3   253    21     4    10    63    40    49    22
5  48017 480000000000 60005      5     3     0     1     0     1     1     0     2
6  48017 480000000000 60006      1   161    19     7     5    27    17    26    13
>

And my code looks like that:

df <- df %>%
  group_by(CODASC) %>%
  mutate(POPULATION = sum(P1 , na.rm= T))

To apply sum within a group across multiple variables you could do, as an example:

library(dplyr)

df %>% 
  group_by(CODASC) %>% 
  mutate(across(P1:last_col(), sum, .names = "{.col}_sum")) %>% 
  ungroup()

To apply this across multiple data frames (if you're grouping by the same variable and summing the same columns) you can iterate through them easily if they're in a list and with the purrr library:

library(purrr)
library(dplyr)

l <- list(df, df, df)

map(l, ~ .x %>% 
      group_by(CODASC) %>% 
      mutate(across(P1:last_col(), sum, .names = "{.col}_sum")) %>% 
      ungroup())

Your code looks like it does what you want, but you are just looking for a way to streamline it to multiple columns?

It looks like your first 4 columns are some identifiers. If you want to summarise all remaining columns you can do something like:

df <- df %>%
  group_by(PROCOM, SEZ2011, SEZ, CODASC) %>%
  summarise_all(sum) ## or whatever function you want here

see https://dplyr.tidyverse.org/reference/summarise_all.html for more details on summarise_all() or summarise_at() .

If you want to create a function to apply to many datasets, perhaps check out making functions: https://swcarpentry.github.io/r-novice-inflammation/02-func-R/ and apply functions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM