I am using R and want to add certain values within a column but only if the rows satisfy a condition. So if I have a data frame data
below:
Team MP Win
ATL 14 .4
ATL 25 .4
ATL 14 .4
BOS 14 .55
BOS 20 .55
BOS 9 .55
How do I store the values of MP
for ATL
(14+25+14 = 53)and BOS
(14+20+9=43)?
EDIT: What if I also want to add a new variable that multiplies Win
by MP
/ sums
(where sums
is the sum of MP
for each team
). So for the ATL
variables, I want the values .4*14/53 and .4*25/53, and for BOS
I want .55*14/43, .55*20/43, .55*9/43
I think that would produce what you're looking for:
Edit
In light of akrun 's excellent answer, here's a more compact solution:
dat$cumsums <- ave(dat$MP, dat$Team, FUN=sum)
dat$newvar <- with(dat, Win * (MP/cumsums))
Previous solution
cumsums <- by(data = dat$MP, INDICES = dat$Team, FUN = sum)
cumsums.df <- data.frame(Team = names(cumsums), cumsums = as.numeric(cumsums))
dat <- merge(x=dat, y=cumsums.df, by = "Team")
dat$newvar <- with(dat, Win * (MP/cumsums))
Results
dat
Team MP Win cumsums newvar
1 ATL 14 0.40 53 0.1056604
2 ATL 25 0.40 53 0.1886792
3 ATL 14 0.40 53 0.1056604
4 BOS 14 0.55 43 0.1790698
5 BOS 20 0.55 43 0.2558140
6 BOS 9 0.55 43 0.1151163
Data
dat <- read.csv(text="Team,MP,Win
ATL,14,.4
ATL,25,.4
ATL,14,.4
BOS,14,.55
BOS,20,.55
BOS,9,.55")
We could do this either using base R
, dplyr
or data.table
.
1. base R
Use within
and ave
to create the columns
within(dat, cumsums <- ave(MP, Team, FUN=sum)
newvar <- Win*(MP/cumsums))[c(1:3, 5:4)]
# Team MP Win cumsums newvar
#1 ATL 14 0.40 53 0.1056604
#2 ATL 25 0.40 53 0.1886792
#3 ATL 14 0.40 53 0.1056604
#4 BOS 14 0.55 43 0.1790698
#5 BOS 20 0.55 43 0.2558140
#6 BOS 9 0.55 43 0.1151163
2. data.table
If we need both the variables 'cumsums', 'newvar', convert the 'data.frame' to 'data.table' ( setDT(dat)
), get the sum
of 'MP' column and use that to create the second column grouped by 'Team'
library(data.table)
setDT(dat)[, c('cumsums', 'newvar') := {tmp=sum(MP)
list(tmp, tmp1 = Win*MP/tmp)}, by = Team][]
# Team MP Win cumsums newvar
#1: ATL 14 0.40 53 0.1056604
#2: ATL 25 0.40 53 0.1886792
#3: ATL 14 0.40 53 0.1056604
#4: BOS 14 0.55 43 0.1790698
#5: BOS 20 0.55 43 0.2558140
#6: BOS 9 0.55 43 0.1151163
3. dplyr
After grouping by 'Team', use mutate
to create the columns 'cumsums' and 'newvar'
library(dplyr)
dat %>%
group_by(Team) %>%
mutate(cumsums= sum(MP), newvar= Win*MP/cumsums)
# Team MP Win cumsums newvar
#1 ATL 14 0.40 53 0.1056604
#2 ATL 25 0.40 53 0.1886792
#3 ATL 14 0.40 53 0.1056604
#4 BOS 14 0.55 43 0.1790698
#5 BOS 20 0.55 43 0.2558140
#6 BOS 9 0.55 43 0.1151163
dat <- structure(list(Team = c("ATL", "ATL", "ATL", "BOS", "BOS", "BOS"
), MP = c(14L, 25L, 14L, 14L, 20L, 9L), Win = c(0.4, 0.4, 0.4,
0.55, 0.55, 0.55)), .Names = c("Team", "MP", "Win"),
class = "data.frame", row.names = c(NA, -6L))
aggregate
will do exactly what you are looking for
> data <- merge(data, aggregate(MP~Team, data = data, sum), by = 'Team', all.x = T)
> names(data) <- c('Team', 'MP', 'Win', 'SumByTeam')
> data$Value <- data$MP /data$SumByTeam * data$Win
> aggregate(Value ~ Team + MP.x, data = data, mean)
Team MP Value
1 BOS 9 0.1151163
2 ATL 14 0.1056604
3 BOS 14 0.1790698
4 BOS 20 0.2558140
5 ATL 25 0.1886792
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.