简体   繁体   中英

R dataframes: how to make a new column that calculates values based on multiple other columns?

Let's say I have a dataframe with one column for colors and one column for shapes. I want to make a third column that is the number of total rows in the dataframe with that color/shape combination.

You could group by your columns and then add a column with the group size. This is easy in dplyr :

library(dplyr)
dat <- data.frame(col=c("red", "red", "red", "blue"), shape=c("oval", "oval", "circle", "circle"))
dat %>% group_by(col, shape) %>% mutate(ct=n()) %>% ungroup()
# # A tibble: 4 x 3
#   col   shape     ct
#   <fct> <fct>  <int>
# 1 red   oval       2
# 2 red   oval       2
# 3 red   circle     1
# 4 blue  circle     1

If instead you wanted to collapse down all the duplicate rows into a single row with the corresponding count, then dat %>% count(col, shape) , as suggested by @RonakShah in the comments, is the way to go.

You can use table to count combinations and use as.data.frame to show it as a data.frame .

as.data.frame(table(x))
#  color shape Freq
#1     1     1    1
#2     2     1    0
#3     1     2    1
#4     2     2    2

Data:

(x <- data.frame(color=c(1,1,2,2), shape=c(1,2,2,2)))
#  color shape
#1     1     1
#2     1     2
#3     2     2
#4     2     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM