Say I have the following data, dat1;
width from by
2 1 A
3 1 A
2 2 A
3 2 A
2 1 B
3 1 B
2 2 B
3 2 B
And additionally have that, dat2;
x pos by
4 1 A
5 2 A
7 3 A
3 4 A
2 1 B
4 2 B
3 3 B
5 4 B
Say I want to create a new column on dat1 of rolling sum values from dat2 where;
Our width of this rolling sum is equivalent to the width given in that row
Our starting position is equivalent to the from vector value in that row
We wish to do it for the A or Bth factor depending on which level is in the row
So far I have that we want
rollapply(x = dat2$x, width = dat1$width, FUN = sum, align = "left", data = dat2)
So I need to incorporate in the starting position and the factor level for that starting position.
So in this instance I want to get
width from by RS
2 1 A 9
3 1 A 16
2 2 A 12
3 2 A 15
etc
Any help would be greatly appreciated. Thanks
1) For each row i
in dat1
the anonymous function subsets dat2 to the by
value in dat1
and from that picks out the indicated rows of x
and sums them:
transform(dat1, RS = sapply(1:nrow(dat1), function(i)
sum(subset(dat2, dat1$by[i] == by)[seq(from[i], length = width[i]), "x"])))
giving:
width from by RS
1 2 1 A 9
2 3 1 A 16
3 2 2 A 12
4 3 2 A 15
5 2 1 B 6
6 3 1 B 9
7 2 2 B 7
8 3 2 B 12
2) An alternative would be to calculate the start values and widths for the sequences to sum in dat2
and then apply that:
st <- match(dat1$by, dat2$by) + dat1$from - 1
w <- dat1$width
Sum <- function(st, w) sum(dat2[seq(st, length = w), "x"])
transform(dat1, RS = mapply(Sum, st, w))
giving:
width from by RS
1 2 1 A 9
2 3 1 A 16
3 2 2 A 12
4 3 2 A 15
5 2 1 B 6
6 3 1 B 9
7 2 2 B 7
8 3 2 B 12
dat1
and dat2
in reproducible form are:
Lines1 <- "
width from by
2 1 A
3 1 A
2 2 A
3 2 A
2 1 B
3 1 B
2 2 B
3 2 B"
dat1 <- read.table(text = Lines1, header = TRUE)
Lines2 <- "
x pos by
4 1 A
5 2 A
7 3 A
3 4 A
2 1 B
4 2 B
3 3 B
5 4 B"
dat2 <- read.table(text = Lines2, header = TRUE)
Fixed (1). Added (2).
Another option could be using dplyr
and join
. The approach would be join
two dataframes by "by". Then apply filter
to consider only those rows which pos
is between from
and from+width
. Finally take sum of x
column.
dat1 %>% inner_join(dat2, by = "by") %>%
filter(from <= pos & pos < (from + width) ) %>%
group_by(by, from, width ) %>%
summarise(RS = sum(x)) %>%
select(width, from, by, RS)
# A tibble: 8 x 4
# Groups: by, from [4]
# width from by RS
# <int> <int> <chr> <int>
# 1 2 1 A 9
# 2 3 1 A 16
# 3 2 2 A 12
# 4 3 2 A 15
# 5 2 1 B 6
# 6 3 1 B 9
# 7 2 2 B 7
# 8 3 2 B 12
data
dat1 <- read.table(text =
"width from by
2 1 A
3 1 A
2 2 A
3 2 A
2 1 B
3 1 B
2 2 B
3 2 B", header = TRUE, stringsAsFactors = FALSE)
dat2 <- read.table(text =
"x pos by
4 1 A
5 2 A
7 3 A
3 4 A
2 1 B
4 2 B
3 3 B
5 4 B", header = TRUE, stringsAsFactors = FALSE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.