簡體   English   中英

從向量中選取一個數字並根據R中的多個條件分配給列

[英]Picking a number from vector and assign to column based on multiple conditions in R

我需要根據多個條件將“ Thickness列添加到“ Products表中。

1:厚度應僅是以下值之一

Plate_Thickness <- c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)

2:厚度應在表中已經存在的ThicknessMinThicknessMax值之間。

當前表如下所示:

Product        ThicknessMin      ThicknessMax    
P0001            0                 8
P0002           31.01              70
P0003           8.01               31
P0004           70.01              999
P0005           8.01               31

因此,我們的想法是從向量中隨機選擇一個值作為“厚度”,但該值應介於“ ThicknessMin和“ ThicknessMax之間。 請幫助任何指針如何進行此操作。 謝謝。

向量化的base R解決方案( df是您的data.frame):

set.seed(1) #just for reproducibility
a<-findInterval(df$ThicknessMin,Plate_Thickness,all.inside=TRUE)
b<-findInterval(df$ThicknessMax,Plate_Thickness,all.inside=TRUE)
Plate_Thickness[runif(length(a)) %/% (1/(b-a+1))+a]
#[1]  5.8 32.5 25.1 98.1  5.8

我們可以使用dplyr包中的rowwise函數從Plate_Thickness向量中sample 在調用sample ,我們sample僅從元素Plate_Thickness它們between ThicknessMinThicknessMax 我將您的表放在一個名為datdata.frame

library(dplyr)
set.seed(123)
dat %>%
    rowwise() %>%
    mutate(thick_sample = sample(Plate_Thickness[between(Plate_Thickness, ThicknessMin, ThicknessMax)],
                          1))

  Product ThicknessMin ThicknessMax thick_sample
   <fctr>        <dbl>        <int>        <dbl>
1   P0001         0.00            8          2.0
2   P0002        31.01           70         55.6
3   P0003         8.01           31         25.1
4   P0004        70.01          999        120.4
5   P0005         8.01           31         27.1

數據(可重復性)

dat <- structure(list(Product = structure(1:5, .Label = c("P0001", "P0002", 
"P0003", "P0004", "P0005"), class = "factor"), ThicknessMin = c(0, 
31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L, 70L, 31L, 999L, 
31L)), .Names = c("Product", "ThicknessMin", "ThicknessMax"), class = "data.frame", row.names = c(NA, 
-5L))

您的資料

Plate_Thickness <- c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)

df <- structure(list(Product = c("P0001", "P0002", "P0003", "P0004", 
"P0005"), ThicknessMin = c(0, 31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L, 
70L, 31L, 999L, 31L), Plate_Thickness = c(5.8, 32.5, 27.1, 120.4, 
25.1)), .Names = c("Product", "ThicknessMin", "ThicknessMax", 
"Plate_Thickness"), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))

library(dplyr)
acceptable_vals <- lapply(1:nrow(df), function(x) Plate_Thickness[between(Plate_Thickness, df$ThicknessMin[x], df$ThicknessMax[x])])
set.seed(1)
df$Plate_Thickness <- sapply(acceptable_vals, function(x) x[sample(1:length(x), 1)])

輸出量

   Product ThicknessMin ThicknessMax Plate_Thickness
1:   P0001         0.00            8             5.8
2:   P0002        31.01           70            32.5
3:   P0003         8.01           31            27.1
4:   P0004        70.01          999           120.4
5:   P0005         8.01           31            25.1
#DATA
df = structure(list(Product = c("P0001", "P0002", "P0003", "P0004", 
"P0005"), ThicknessMin = c(0, 31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L, 
70L, 31L, 999L, 31L)), .Names = c("Product", "ThicknessMin", 
"ThicknessMax"), class = c("data.table", "data.frame"), row.names = c(NA, 
-5L))

Plate_Thickness = c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)

set.seed(1)
apply(X = df[c("ThicknessMin", "ThicknessMax")],
      MARGIN = 1,        #Run FUN on each row of X
      FUN = function(x)
          #Retain only eligible values for each row and sample 1 value
          sample(x = Plate_Thickness[Plate_Thickness > x[1] & Plate_Thickness < x[2]],
                 size = 1))
#[1]   2.0  32.5  27.1 120.4  25.1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM