df1 <- data.frame(Chr=1, Pos= c(100,200,300,400),stringsAsFactors=F)
df2 <- data.frame(Chr=1, PosStart= c(25,25,150,175,225,275,375),PosEnd= c(150,75,275,300,400,500,750),stringsAsFactors=F)
I am trying to compare the Pos
values in df1
to see if the fall between any PosStart
and PosEnd
of df2
. This could be true for more than 1 rows of df2
. In the output I am trying to append the df1$Pos
as a new column df2$CoPos
; each time the condition is true. The output should be someting like:
Chr PosStart PosEnd CoPos
1 25 150 100
1 150 275 200
1 175 300 200
1 225 400 300
1 275 500 300
1 375 750 400
I have done something like:
for(i in 1:length(df1$Pos)){
for(j in 1:length(df2$PosStart){
df2$CoPos[j]<- df1$Pos[which(df2$PosStart[j] < df1$Pos[i] < df2$PosEnd[j])]
}
}
Can someone please tell me if there is a way to do this without looping. Also what am I doing wrong here? After months of grappling I don't think I still understand the concept of looping.
Thanks a bunch in advance.
you can apply
the check to each row of df2
:
myfun <- function(x) {
data.frame(df2[x['Pos'] < df2$PosEnd & x['Pos'] > df2$PosStart,], Pos=x['Pos'])
}
Which will return a row or rows from df2 where the condition is met as well as the Pos
value.
> apply(df1, 1, myfun)
[[1]]
Chr PosStart PosEnd Pos
1 1 25 150 100
[[2]]
Chr PosStart PosEnd Pos
3 1 150 275 200
4 1 175 300 200
[[3]]
Chr PosStart PosEnd Pos
5 1 225 400 300
6 1 275 500 300
[[4]]
Chr PosStart PosEnd Pos
6 1 275 500 400
7 1 375 750 400
>
then you can use plyr
and ldply
to convert to a list:
> library(plyr)
> ldply(apply(df1, 1, myfun), as.data.frame)
Chr PosStart PosEnd Pos
1 1 25 150 100
2 1 150 275 200
3 1 175 300 200
4 1 225 400 300
5 1 275 500 300
6 1 275 500 400
7 1 375 750 400
>
Edit for comment:
This is a hard thing to do in a for loop. You don't know how many matches you'll have in advance. It could be that every row in df1
matches every row in df2
or that none do or anything in between. Thus, you don't know how big your output needs to be. This is the perfect example of bad for loop
practice in R. As in, if you are growing your output vector rather than assigning into it "you're gonna have a bad time mm'kay."
With that said, to make your loop work, you would need to make the CoPos
column first.
df2$CoPos <- NA
Then execute something similar to your loop:
for (i in 1:length(df1$Pos)) {
for (j in 1:length(df2$PosStart)) {
if (df2$PosStart[j] < df1$Pos[i] & df2$PosEnd[j] > df1$Pos[i]) {
df2$CoPos[j] <- df1$Pos[i]
}
}
}
However, if you find two rows in df1
that fit your constraint, you'll only record the second one you find into the appropriate row in df2
.
Instead, you could grow a new data.frame like this:
df3 <- data.frame(Chr=1, Pos= c(100, 125, 200,300,400),stringsAsFactors=F)
out <- data.frame()
for (i in 1:length(df3$Pos)) {
for (j in 1:length(df2$PosStart)) {
if (df2$PosStart[j] < df3$Pos[i] & df2$PosEnd[j] > df3$Pos[i]) {
out <- rbind(out, cbind(df2[j,], df3$Pos[i]))
}
}
}
But, don't do this... please don't :) While I'm evangelizing, take a look at the R-Inferno for an excellent reference on common pitfalls in R.
While @Justin 's answer works in this case, using apply
on a data.frame can lead to confusing errors if you don't remember that apply
converts your data.frame to a matrix before calling FUN
on each row/column.
Here is a more general solution that avoids this potential issue:
compareFun <- function(x) {
data.frame(df2[x > df2$PosStart & x < df2$PosEnd,], Pos=x)
}
do.call(rbind, lapply(df1$Pos, compareFun))
To elaborate, if df1
and df2
were instead defined with Chr
being character, Justin's solution would throw an error that doesn't make it clear what caused the problem:
df1 <- data.frame(Chr="1", Pos=c(100,200,300,400), stringsAsFactors=FALSE)
df2 <- data.frame(Chr="1", PosStart=c(25,25,150,175,225,275,375),
PosEnd=c(150,75,275,300,400,500,750), stringsAsFactors=FALSE)
apply(df1, 1, myfun)
# Error in data.frame(df2[x["Pos"] < df2$PosEnd & x["Pos"] > df2$PosStart, :
# arguments imply differing number of rows: 0, 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.