简体   繁体   中英

Rolling sum from a certain position in a data frame in R

Say I have the following data, dat1;

width  from  by
2      1     A
3      1     A
2      2     A
3      2     A
2      1     B
3      1     B
2      2     B
3      2     B

And additionally have that, dat2;

x      pos   by
4      1     A
5      2     A
7      3     A
3      4     A
2      1     B
4      2     B
3      3     B
5      4     B

Say I want to create a new column on dat1 of rolling sum values from dat2 where;

  1. Our width of this rolling sum is equivalent to the width given in that row

  2. Our starting position is equivalent to the from vector value in that row

  3. We wish to do it for the A or Bth factor depending on which level is in the row

So far I have that we want

rollapply(x = dat2$x, width = dat1$width, FUN = sum, align = "left", data = dat2)

So I need to incorporate in the starting position and the factor level for that starting position.

So in this instance I want to get

width  from  by   RS
2      1     A    9
3      1     A    16
2      2     A    12
3      2     A    15

etc

Any help would be greatly appreciated. Thanks

1) For each row i in dat1 the anonymous function subsets dat2 to the by value in dat1 and from that picks out the indicated rows of x and sums them:

transform(dat1, RS = sapply(1:nrow(dat1), function(i) 
 sum(subset(dat2, dat1$by[i] == by)[seq(from[i], length = width[i]), "x"])))

giving:

  width from by RS
1     2    1  A  9
2     3    1  A 16
3     2    2  A 12
4     3    2  A 15
5     2    1  B  6
6     3    1  B  9
7     2    2  B  7
8     3    2  B 12

2) An alternative would be to calculate the start values and widths for the sequences to sum in dat2 and then apply that:

st <- match(dat1$by, dat2$by) + dat1$from - 1
w <- dat1$width
Sum <- function(st, w) sum(dat2[seq(st, length = w), "x"])
transform(dat1, RS = mapply(Sum, st, w))

giving:

  width from by RS
1     2    1  A  9
2     3    1  A 16
3     2    2  A 12
4     3    2  A 15
5     2    1  B  6
6     3    1  B  9
7     2    2  B  7
8     3    2  B 12

Note

dat1 and dat2 in reproducible form are:

Lines1 <- "
width  from  by
2      1     A
3      1     A
2      2     A
3      2     A
2      1     B
3      1     B
2      2     B
3      2     B"
dat1 <- read.table(text = Lines1, header = TRUE)

Lines2 <- "
x      pos   by
4      1     A
5      2     A
7      3     A
3      4     A
2      1     B
4      2     B
3      3     B
5      4     B"
dat2 <- read.table(text = Lines2, header = TRUE)

Update

Fixed (1). Added (2).

Another option could be using dplyr and join . The approach would be join two dataframes by "by". Then apply filter to consider only those rows which pos is between from and from+width . Finally take sum of x column.

dat1 %>% inner_join(dat2, by = "by") %>%
  filter(from <= pos & pos < (from + width) ) %>%
  group_by(by, from, width ) %>%
  summarise(RS = sum(x)) %>%
  select(width, from, by, RS)


# A tibble: 8 x 4
# Groups: by, from [4]
# width  from by       RS
# <int> <int> <chr> <int>
# 1     2     1 A         9
# 2     3     1 A        16
# 3     2     2 A        12
# 4     3     2 A        15
# 5     2     1 B         6
# 6     3     1 B         9
# 7     2     2 B         7
# 8     3     2 B        12

data

dat1 <- read.table(text = 
"width  from  by
2      1     A
3      1     A
2      2     A
3      2     A
2      1     B
3      1     B
2      2     B
3      2     B", header = TRUE, stringsAsFactors = FALSE)


dat2 <- read.table(text = 
"x      pos   by
4      1     A
5      2     A
7      3     A
3      4     A
2      1     B
4      2     B
3      3     B
5      4     B", header = TRUE, stringsAsFactors = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM