简体   繁体   中英

Add certain rows in a column that satisfy condition in R?

I am using R and want to add certain values within a column but only if the rows satisfy a condition. So if I have a data frame data below:

 Team MP Win
 ATL  14 .4
 ATL  25 .4
 ATL  14 .4
 BOS  14 .55
 BOS  20  .55
 BOS  9  .55

How do I store the values of MP for ATL (14+25+14 = 53)and BOS (14+20+9=43)?

EDIT: What if I also want to add a new variable that multiplies Win by MP / sums (where sums is the sum of MP for each team ). So for the ATL variables, I want the values .4*14/53 and .4*25/53, and for BOS I want .55*14/43, .55*20/43, .55*9/43

I think that would produce what you're looking for:

Edit

In light of akrun 's excellent answer, here's a more compact solution:

dat$cumsums <- ave(dat$MP, dat$Team, FUN=sum)
dat$newvar <- with(dat, Win * (MP/cumsums))

Previous solution

cumsums <- by(data = dat$MP, INDICES = dat$Team, FUN = sum)
cumsums.df <- data.frame(Team = names(cumsums), cumsums = as.numeric(cumsums))
dat <- merge(x=dat, y=cumsums.df, by = "Team")
dat$newvar <- with(dat, Win * (MP/cumsums))

Results

dat
  Team MP  Win cumsums    newvar
1  ATL 14 0.40      53 0.1056604
2  ATL 25 0.40      53 0.1886792
3  ATL 14 0.40      53 0.1056604
4  BOS 14 0.55      43 0.1790698
5  BOS 20 0.55      43 0.2558140
6  BOS  9 0.55      43 0.1151163

Data

dat <- read.csv(text="Team,MP,Win
ATL,14,.4
ATL,25,.4
ATL,14,.4
BOS,14,.55
BOS,20,.55
BOS,9,.55")

We could do this either using base R , dplyr or data.table .

1. base R

Use within and ave to create the columns

  within(dat, cumsums <- ave(MP, Team, FUN=sum)
                newvar <- Win*(MP/cumsums))[c(1:3, 5:4)]
  #  Team MP  Win cumsums    newvar
  #1  ATL 14 0.40      53 0.1056604
  #2  ATL 25 0.40      53 0.1886792
  #3  ATL 14 0.40      53 0.1056604
  #4  BOS 14 0.55      43 0.1790698
  #5  BOS 20 0.55      43 0.2558140
  #6  BOS  9 0.55      43 0.1151163

2. data.table

If we need both the variables 'cumsums', 'newvar', convert the 'data.frame' to 'data.table' ( setDT(dat) ), get the sum of 'MP' column and use that to create the second column grouped by 'Team'

library(data.table)
setDT(dat)[, c('cumsums', 'newvar') := {tmp=sum(MP) 
                   list(tmp, tmp1 = Win*MP/tmp)}, by = Team][]
#    Team MP  Win cumsums    newvar
#1:  ATL 14 0.40      53 0.1056604
#2:  ATL 25 0.40      53 0.1886792
#3:  ATL 14 0.40      53 0.1056604
#4:  BOS 14 0.55      43 0.1790698
#5:  BOS 20 0.55      43 0.2558140
#6:  BOS  9 0.55      43 0.1151163

3. dplyr

After grouping by 'Team', use mutate to create the columns 'cumsums' and 'newvar'

library(dplyr)
 dat %>% 
     group_by(Team) %>% 
     mutate(cumsums= sum(MP), newvar= Win*MP/cumsums)
 #  Team MP  Win cumsums    newvar
 #1  ATL 14 0.40      53 0.1056604
 #2  ATL 25 0.40      53 0.1886792
 #3  ATL 14 0.40      53 0.1056604
 #4  BOS 14 0.55      43 0.1790698
 #5  BOS 20 0.55      43 0.2558140
 #6  BOS  9 0.55      43 0.1151163

data

dat <- structure(list(Team = c("ATL", "ATL", "ATL", "BOS", "BOS", "BOS"
 ), MP = c(14L, 25L, 14L, 14L, 20L, 9L), Win = c(0.4, 0.4, 0.4, 
 0.55, 0.55, 0.55)), .Names = c("Team", "MP", "Win"),
 class = "data.frame", row.names = c(NA, -6L))

aggregate will do exactly what you are looking for

> data <- merge(data, aggregate(MP~Team, data = data, sum), by = 'Team', all.x = T)
> names(data) <- c('Team', 'MP', 'Win', 'SumByTeam')
> data$Value <- data$MP /data$SumByTeam * data$Win
> aggregate(Value ~ Team + MP.x, data = data, mean)
  Team     MP         Value
1  BOS      9     0.1151163
2  ATL     14     0.1056604
3  BOS     14     0.1790698
4  BOS     20     0.2558140
5  ATL     25     0.1886792

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM