简体   繁体   中英

How do I summarise all columns except one(s) I specify?

I want to sum up all but one numerical column in this dataframe.

Group, Registered, Votes, Beans
A,     111,        12,     100
A,     111,        13,     200
A,     111,        14,     300

I want to group this by Group , summing up all the columns except Registered .

summarise_if(
  .tbl = group_by(
    .data = x,
    Precinct
  ),
  .predicate = is.numeric,
  .funs = sum
)

Problem here is the result is a data frame that sums ALL the numeric columns, including Registered . How do I sum all but Registered ?

The output I want would look like this

Group, Registered, Votes, Beans
A,     111,        39,    600

I would use summarise_at , and just make a logical vector which is FALSE for non-numeric columns and Registered and TRUE otherwise, ie

df %>% 
  summarise_at(which(sapply(df, is.numeric) & names(df) != 'Registered'), sum)

If you wanted to just summarise all but one column you could do

df %>% 
  summarise_at(vars(-Registered), sum)

but in this case you have to check if it's numeric also.

Notes:

  • factors are technically numeric, so if you want to exclude non-numeric columns and factors, replace sapply(df, is.numeric) with sapply(df, function(x) is.numeric(x) & !is.factor(x))

  • If your data is big I think it is faster to use sapply(df[1,], is.numeric) instead of sapply(df, is.numeric) . (Someone please correct me if I'm wrong)

Edit:

Modified versions of the two methods above for dplyr version >= 1, since summarise_at is superseded

df %>% 
  summarise(across(where(is.numeric) & !Registered, sum))

df %>% 
  summarise(across(-Registered, sum))

We can use summarise_if

library(dplyr)
df %>% 
   select(-Registered) %>%
   summarise_if(is.numeric, sum)
#  Votes Beans
#1    39   600
dt = read.table(text = "
Group Registered Votes Beans
A     111        12     100
A     111        13     200
A     111        14     300
", header=T)

library(dplyr)

# specify grouping variables
v1 = "Group"
v2 = "Registered"

dt %>%
  group_by_(v1, v2) %>%
  summarise_all(sum) %>%
  ungroup()

# # A tibble: 1 x 4
#     Group Registered Votes Beans
#     <fct>      <int> <int> <int>
#   1 A            111    39   600

Note that I have to assume that within each Group value there's a unique Registered value, so you can group by both variables, instead of grouping only by Group and keeping the unique value of Registered .

I needed something similar, so by using the answer of @akrun This answer above I did

df <- as_tibble(df)

df %>%
select(-Type) %>%
summarise_all(sum)

Where "Type" is the non-numeric (char) column for example in the iris data set, Type is the "Species" columns. So I got the sum of all the other columns which happens to be numeric.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM