简体   繁体   中英

Alternative for looping in R

df1 <- data.frame(Chr=1, Pos= c(100,200,300,400),stringsAsFactors=F)

df2 <- data.frame(Chr=1, PosStart= c(25,25,150,175,225,275,375),PosEnd= c(150,75,275,300,400,500,750),stringsAsFactors=F)

I am trying to compare the Pos values in df1 to see if the fall between any PosStart and PosEnd of df2 . This could be true for more than 1 rows of df2 . In the output I am trying to append the df1$Pos as a new column df2$CoPos ; each time the condition is true. The output should be someting like:

Chr PosStart PosEnd CoPos
1       25    150   100
1      150    275   200
1      175    300   200
1      225    400   300
1      275    500   300
1      375    750   400

I have done something like:

for(i in 1:length(df1$Pos)){

    for(j in 1:length(df2$PosStart){

            df2$CoPos[j]<- df1$Pos[which(df2$PosStart[j] < df1$Pos[i] < df2$PosEnd[j])]
    }

}

Can someone please tell me if there is a way to do this without looping. Also what am I doing wrong here? After months of grappling I don't think I still understand the concept of looping.

Thanks a bunch in advance.

you can apply the check to each row of df2 :

myfun <- function(x) {
  data.frame(df2[x['Pos'] < df2$PosEnd & x['Pos'] > df2$PosStart,], Pos=x['Pos'])
}

Which will return a row or rows from df2 where the condition is met as well as the Pos value.

> apply(df1, 1, myfun)
[[1]]
  Chr PosStart PosEnd Pos
1   1       25    150 100

[[2]]
  Chr PosStart PosEnd Pos
3   1      150    275 200
4   1      175    300 200

[[3]]
  Chr PosStart PosEnd Pos
5   1      225    400 300
6   1      275    500 300

[[4]]
  Chr PosStart PosEnd Pos
6   1      275    500 400
7   1      375    750 400

> 

then you can use plyr and ldply to convert to a list:

> library(plyr)
> ldply(apply(df1, 1, myfun), as.data.frame)
  Chr PosStart PosEnd Pos
1   1       25    150 100
2   1      150    275 200
3   1      175    300 200
4   1      225    400 300
5   1      275    500 300
6   1      275    500 400
7   1      375    750 400
> 

Edit for comment:

This is a hard thing to do in a for loop. You don't know how many matches you'll have in advance. It could be that every row in df1 matches every row in df2 or that none do or anything in between. Thus, you don't know how big your output needs to be. This is the perfect example of bad for loop practice in R. As in, if you are growing your output vector rather than assigning into it "you're gonna have a bad time mm'kay."

With that said, to make your loop work, you would need to make the CoPos column first.

df2$CoPos <- NA

Then execute something similar to your loop:

for (i in 1:length(df1$Pos)) {
    for (j in 1:length(df2$PosStart)) {
            if (df2$PosStart[j] < df1$Pos[i] & df2$PosEnd[j] > df1$Pos[i]) {
                    df2$CoPos[j] <- df1$Pos[i]
            }
    }

}

However, if you find two rows in df1 that fit your constraint, you'll only record the second one you find into the appropriate row in df2 .

Instead, you could grow a new data.frame like this:

df3 <- data.frame(Chr=1, Pos= c(100, 125, 200,300,400),stringsAsFactors=F)

out <- data.frame()

for (i in 1:length(df3$Pos)) {
    for (j in 1:length(df2$PosStart)) {
            if (df2$PosStart[j] < df3$Pos[i] & df2$PosEnd[j] > df3$Pos[i]) {
                    out <- rbind(out, cbind(df2[j,], df3$Pos[i]))
            }
    }

}

But, don't do this... please don't :) While I'm evangelizing, take a look at the R-Inferno for an excellent reference on common pitfalls in R.

While @Justin 's answer works in this case, using apply on a data.frame can lead to confusing errors if you don't remember that apply converts your data.frame to a matrix before calling FUN on each row/column.

Here is a more general solution that avoids this potential issue:

compareFun <- function(x) {
  data.frame(df2[x > df2$PosStart & x < df2$PosEnd,], Pos=x)
}
do.call(rbind, lapply(df1$Pos, compareFun))

To elaborate, if df1 and df2 were instead defined with Chr being character, Justin's solution would throw an error that doesn't make it clear what caused the problem:

df1 <- data.frame(Chr="1", Pos=c(100,200,300,400), stringsAsFactors=FALSE)
df2 <- data.frame(Chr="1", PosStart=c(25,25,150,175,225,275,375),
  PosEnd=c(150,75,275,300,400,500,750), stringsAsFactors=FALSE)
apply(df1, 1, myfun)
# Error in data.frame(df2[x["Pos"] < df2$PosEnd & x["Pos"] > df2$PosStart,  : 
#  arguments imply differing number of rows: 0, 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM