简体   繁体   中英

In R, how to sum certain rows of a data frame with certain logic?

Hi experienced R users,

It's kind of a simple thing. I want to sum x by Group.1 depending on one controllable variable.

I'd like to sum x by grouping the first two rows when I say something like: number <- 2 If I say 3 , it should sum x of the first three rows by Group.1

Any idea how I might tackle this problem? Should I write a function? Thank y'all in advance.

  Group.1  Group.2      x
1       1     Eggs 230299
2       2     Eggs 263066
3       3     Eggs 266504
4       4     Eggs 177196

If the sums you want are always cumulative, there's a function for that, cumsum . It works like this.

> cumsum(c(1,2,3))
[1] 1 3 6

In this case you might want something like

> mysum <- cumsum(yourdata$x)
> mysum[2] # the sum of the first two rows
> mysum[3] # the sum of the first three rows
> mysum[number] # the sum of the first "number" rows

假设您的数据在mydata

with(mydata, sum(x[Group.1 <= 2])

You could use the by function.

For instance, given the following data.frame:

d <- data.frame(Group.1=c(1,1,2,1,3,3,1,3),Group.2=c('Eggs'),x=1:8)

> d
  Group.1 Group.2 x
1       1    Eggs 1
2       1    Eggs 2
3       2    Eggs 3
4       1    Eggs 4
5       3    Eggs 5
6       3    Eggs 6
7       1    Eggs 7
8       3    Eggs 8

You can do this:

num <- 3 # sum only the first 3 rows

# The aggregation function:
# it is called for each group receiving the 
# data.frame subset as input and returns the aggregated row
innerFunc <- function(subDf){
  # we create the aggregated row by taking the first row of the subset
  row <- head(subDf,1)
  # we set the x column in the result row to the sum of the first "num"
  # elements of the subset
  row$x <- sum(head(subDf$x,num))
  return(row)
}
# Here we call the "by" function:
# it returns an object of class "by" that is a list of the resulting
# aggregated rows; we want to convert it to a data.frame, so we call
# rbind repeatedly by using "do.call(rbind, ... )"
d2 <- do.call(rbind,by(data=d,INDICES=d$Group.1,FUN=innerFunc))

> d2
  Group.1 Group.2  x
1       1    Eggs  7
2       2    Eggs  3
3       3    Eggs 19

If you want to sum only a subset of your data:

my_data <- data.frame(c("TRUE","FALSE","FALSE","FALSE","TRUE"), c(1,2,3,4,5))
names(my_data)[1] <- "DESCRIPTION" #Change Column Name
names(my_data)[2] <- "NUMBER"      #Change Column Name

sum(subset(my_data, my_data$DESCRIPTION=="TRUE")$NUMBER)

You should get 6.

Not sure why Eggs are important here ;)

df1 <- data.frame(Gr=seq(4),
                  x=c(230299, 263066, 266504, 177196)
                  )

now with n=2 ie first two rows:

n <- 2
sum(df1[, "x"][df1[, "Gr"]<=n]) 

The expression [df1[, "Gr"]<=n] creates a logical vector to subset the elements in df1[, "x"] before sum ming them.

Also, it appears your Group.1 is the same as the row no. If so this may be simpler:

sum(df1[, "x"][1:n])

or to get all at once

cumsum(df1[, "x"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM