简体   繁体   中英

For each row in a dataframe, loop through another dataframe

I need to loop through a dataframe, read the value of three columns (2 timestamps and 1 label). Then, for this row of three values, I need to compare with each row of a second dataframe to see whether A) the label matches, and B) the timestamp that is in the second dataframe is between the two timestamps of the current row. If the row does indeed match these two criteria, it should be saved to a dataframe / vector for further processing.

I have tried many versions of the x_apply function, in combination with a for loop (for the 'second' iteration). Below is a very simplified version of my problem, where I create two small dataframes and try to establish the required looping. The value should be saved to 'x' - and while this value is shown when I print(x) in the loop, 'x' is claimed to be NULL after the apply function is completed. It appears to be reset every time the function is called as well. Taking my requirements into account, do you have ideas for a different / better approach? I am not required to use apply per se. Thank you very much in advance!

label <- c("p1", "p1", "p2")
value_1 <- c(8,4,2)
value_2 <- c(10,6,9)
df1 <- data.frame(label, value_1, value_2)

label <- c("p1", "p2", "p2")
value_3 <- c(8,8,8)
df2 <- data.frame(label, value_3)

x = NULL

small_function <- function(value_1, value_2, label) {
  for(i in 1:nrow(df2[df2$label == label,])) {
    print(i)
    x <- append(x, i)
    print(x)
  }
}

apply(df1, 1, function(x,y,z) small_function(df1$value_1, df1$value_2, df1$label))
x

Update: An example with time dates, resulting in the error "Expecting a single value" for me.

label <- c("p1", "p1", "p2")
value_1 <- c(as.POSIXct(1482645600, origin="1970-01-01"),as.POSIXct(1482745600, origin="1970-01-01"),as.POSIXct(1482845600, origin="1970-01-01"))
value_2 <- c(as.POSIXct(1582645600, origin="1970-01-01"),as.POSIXct(1582745600, origin="1970-01-01"),as.POSIXct(1582845600, origin="1970-01-01"))
df1 <- data.frame(label, value_1, value_2)
label <- c("p1", "p2", "p2")
value_3 <- c(as.POSIXct(1582645100, origin="1970-01-01"),as.POSIXct(1582745200, origin="1970-01-01"),as.POSIXct(1582845300, origin="1970-01-01"))
df2 <- data.frame(label, value_3)

df_merge = merge(df1, df2, c("label"), suffixes = c(".df1",".df2"))
setDT(df_merge)
str(df_merge)
a <- df_merge[between(value_3, value_1, value_2), ]

is this what you are looking for?

library(data.table)
setDT(df1)
setDT(df2)    
setkey(df1, label)
setkey(df2, label)
df1[df2]  # here i merge both the data.table

df3[between(value_3, value_1, value_2), ]  # apply the condition
#   label value_1 value_2 value_3
#1:    p1       8      10       8
#2:    p2       2       9       8
#3:    p2       2       9       8

with some Dates in data:

# ensure the dates are in proper formats( i had simulated some sample data with dates. just sharing the last 2 steps output)
df3$value_1 = as.Date(df3$value_1, format= "%d/%m/%Y")
df3$value_2 = as.Date(df3$value_2, format= "%d/%m/%Y")
df3$value_3 = as.Date(df3$value_3, format= "%d/%m/%Y")
# df3
#   label    value_1    value_2    value_3
#1:    p1 2016-03-10 2016-03-20 2016-03-15
#2:    p1 2016-06-17 2016-06-19 2016-03-15
#3:    p2 2016-09-10 2016-09-20 2016-06-21
#4:    p2 2016-09-10 2016-09-20 2016-09-12

df3[between(value_3, value_1, value_2), ]
#   label    value_1    value_2    value_3
#1:    p1 2016-03-10 2016-03-20 2016-03-15
#2:    p2 2016-09-10 2016-09-20 2016-09-12

Here is a very short solution in base R, if this is what you are looking for:

dfr <- merge(df1, df2, by="label", all=FALSE)
subset(dfr, value_3 > value_1 & value_3 < value_2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM