简体   繁体   中英

How can I subtract rows with matching value in one column in R?

I have a dataframe that looks like this (truncated from real data):

   host month    score        se
1   V43     0 8.000000 0.4472136
2   V43     1 6.000000 0.0000000
3   V43     3 6.000000 0.0000000
4   V51     0 6.000000 0.0000000
5   V51     1 7.333333 0.4216370
6   V51     3 7.333333 0.2108185
7   V51     6 6.000000 0.0000000

I want to subtract the month 0 score for each host from score for each month for that host. Each host's month 0 score needs to be applied separately, so that it'd look like this:

   host month     score         se
1   V43     0  0.000000 0.4472136
2   V43     1 -2.000000 0.0000000
3   V43     3 -2.000000 0.0000000
4   V51     0  0.000000 0.0000000
5   V51     1  1.333333 0.4216370
6   V51     3  1.333333 0.2108185
7   V51     6  0.000000 0.0000000

In other words, I want to have each month show the difference from the starting point rather than absolute value.

Finding the month 0 rows is easy enough but I can't figure out how I can then match each row with the right host and do the subtraction. Is there a way to do this without using a for loop?

使用plyr ,并按hostmonth先对数据帧进行排序。

ddply(df, .(host), transform, score=score-score[1])

Here is one way to do it. This has a for loop, but it doesn't loop over each row in your dataframe, it just loops over each host.

x <- data.frame(host = c(43, 43, 43, 51, 51, 51, 51), month = c(0,1,2,0,2,4,5), val = c(12, 19, 32, 3, 5, 7, 9))

y <- split(x, x$host)

output <- NULL

for (h in y) {
    start.i <- which(h$month ==0, arr.ind = TRUE)
    h$val <- h$val - h$val[start.i]

    output <- rbind(output, h)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM