简体   繁体   中英

How to compare every value from a column from one df with every value from a column of another df in R? dfs with different row numbers

Which box fits in which basket by also taking into consideration the priority of the boxes and baskets?

df.boxes has the following columns:
boxID - the name of the box
boxX - the size of the box in the X dimension
boxY - the size of the box in the Y dimension
importance - which box should be assigned to a basket first. like a priority - 555 most important (with the highest priority), 111 least important (with the lowest priority)

df.basket has the following columns:
basketID - the name of the basket
basketX - the size of the basket in the X dimension
basketY - the size of the basket in the Y dimension
priorityOfSelection - which basket should be first filled with a box. 1 - highest priority, 7 - lowest priority

for example, box1 doesn't fit in the basket with the highest priority, aka basket 1, so it moves down to the next basket with priorityOfSelection "2" and stores its name in a new column "boxes" of df.basket.

I have a though, first I order the two data frames based on their "importance", "priorotyOfSelection" and compare the size of the boxes to the size of the basket and if there is a match I assign the name of the box to the corresponding basket. In this order of thinking I am trying to create nested for-loop - unsuccessfully, as you may see.

Could anyone point out what and where I do wrong or direct me to an alternative approach, that would be also highly appreciated?

reprex

 df.boxes <-structure(list(boxID = c("box 1", "box 2", "box 3", "box 4", "box 5"), 
                                  boxX = c(600,450, 400, 350, 200), 
                                  boxY = c(600, 400, 450, 500, 300),
                                  importance = c(555, 444, 333, 222, 111) 
                                  ), class = "data.frame", row.names = c(NA, -5L))

df.basket <- structure(list(basketID = c("basket 1", "basket 2", "basket 3","basket 4", "basket 5", "basket 6", "basket 8"), 
                            basketX = c(500,650, 500,200, 450, 500,300),
                            basketY = c(450,650, 500,300,450,500, 300),
                            priorityOfSelection = c(1, 2, 3, 4, 5,6,7) 
                            ), class = "data.frame", row.names = c(NA, -7L))

attempt:

for (i in 1:nrow(df.boxes)){
  for(j in 1:nrow(df.basket)){
  df.basket$box[j] <- ifelse((df.boxes$boxX[i] <= df.basket$basketX[j] | df.boxes$boxY[i] <= df.basket$basketX[j]) & (df.boxes$boxX[i] <= df.basket$basketY[j] | df.boxes$boxY[i] <= df.basket$basketY[j]), 
                                df.boxes$boxID[i], "none")
  }
}

desired output:
在此处输入图片说明

Thank you very much for your time!

If you set up an extra column in df.boxes to record whether the box has been "used" or not, you can do it this way:

df.basket$box <- character(nrow(df.basket))
df.boxes$used <- logical(nrow(df.boxes))

for(i in sort(df.basket$priorityOfSelection))
{
  fits <- which(df.boxes$boxX <= df.basket$basketX[i] &
                df.boxes$boxY <= df.basket$basketY[i] &
                df.boxes$used == FALSE)

  df.basket$box[which(df.basket$priorityOfSelection == i)] <- 
    paste("box", fits[which.max(df.boxes$importance[fits])])
  df.boxes$used[fits[which.max(df.boxes$importance[fits])]] <- TRUE
}

df.basket$box[df.basket$box == "box "] <- "none"

df.basket
#>   basketID basketX basketY priorityOfSelection   box
#> 1 basket 1     500     450                   1 box 2
#> 2 basket 2     650     650                   2 box 1
#> 3 basket 3     500     500                   3 box 3
#> 4 basket 4     200     300                   4 box 5
#> 5 basket 5     450     450                   5  none
#> 6 basket 6     500     500                   6 box 4
#> 7 basket 8     300     300                   7  none

Created on 2020-03-09 by the reprex package (v0.3.0)

Just for fun a late submission with "pure for looping" and if 's with no extra columns.
Under the assumption that what you are basically trying to achieve is pushing the boxes from top to bottom into the (priority ordered) baskets (list)

df.basket$box <- NA
for (i in seq.int(df.boxes$boxID)){
  for (j in seq.int(df.basket$basketID)){
    if(is.na(df.basket$box[j])){ 
        if (all( c(df.basket$basketX[j], df.basket$basketY[j]) -
                 c(df.boxes$boxX[i], df.boxes$boxY[i]) >= 0)){
            df.basket$box[j] <- df.boxes$boxID[i] 
  break
        }
    }
  }
}
df.basket$box[is.na(df.basket$box)] <- "none" 
df.basket 

    basketID basketX basketY priorityOfSelection   box
1 basket 1     500     450                   1 box 2
2 basket 2     650     650                   2 box 1
3 basket 3     500     500                   3 box 3
4 basket 4     200     300                   4 box 5
5 basket 5     450     450                   5  none
6 basket 6     500     500                   6 box 4
7 basket 8     300     300                   7  none

For sure not as elegant as the solution by '@Allan Cameron' but also a possible approach, if you wanna grasp the for loop approach you started out with in your own attempt a bit better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM