![](/img/trans.png)
[英]assign values to vector in a dataframe based on conditions from other vectors in R
[英]Picking a number from vector and assign to column based on multiple conditions in R
我需要根據多個條件將“ Thickness
列添加到“ Products
表中。
1:厚度應僅是以下值之一
Plate_Thickness <- c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)
2:厚度應在表中已經存在的ThicknessMin
和ThicknessMax
值之間。
當前表如下所示:
Product ThicknessMin ThicknessMax
P0001 0 8
P0002 31.01 70
P0003 8.01 31
P0004 70.01 999
P0005 8.01 31
因此,我們的想法是從向量中隨機選擇一個值作為“厚度”,但該值應介於“ ThicknessMin
和“ ThicknessMax
之間。 請幫助任何指針如何進行此操作。 謝謝。
向量化的base
R解決方案( df
是您的data.frame):
set.seed(1) #just for reproducibility
a<-findInterval(df$ThicknessMin,Plate_Thickness,all.inside=TRUE)
b<-findInterval(df$ThicknessMax,Plate_Thickness,all.inside=TRUE)
Plate_Thickness[runif(length(a)) %/% (1/(b-a+1))+a]
#[1] 5.8 32.5 25.1 98.1 5.8
我們可以使用dplyr
包中的rowwise
函數從Plate_Thickness
向量中sample
。 在調用sample
,我們sample
僅從元素Plate_Thickness
它們between
ThicknessMin
和ThicknessMax
。 我將您的表放在一個名為dat
的data.frame
:
library(dplyr)
set.seed(123)
dat %>%
rowwise() %>%
mutate(thick_sample = sample(Plate_Thickness[between(Plate_Thickness, ThicknessMin, ThicknessMax)],
1))
Product ThicknessMin ThicknessMax thick_sample
<fctr> <dbl> <int> <dbl>
1 P0001 0.00 8 2.0
2 P0002 31.01 70 55.6
3 P0003 8.01 31 25.1
4 P0004 70.01 999 120.4
5 P0005 8.01 31 27.1
dat <- structure(list(Product = structure(1:5, .Label = c("P0001", "P0002",
"P0003", "P0004", "P0005"), class = "factor"), ThicknessMin = c(0,
31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L, 70L, 31L, 999L,
31L)), .Names = c("Product", "ThicknessMin", "ThicknessMax"), class = "data.frame", row.names = c(NA,
-5L))
Plate_Thickness <- c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)
df <- structure(list(Product = c("P0001", "P0002", "P0003", "P0004",
"P0005"), ThicknessMin = c(0, 31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L,
70L, 31L, 999L, 31L), Plate_Thickness = c(5.8, 32.5, 27.1, 120.4,
25.1)), .Names = c("Product", "ThicknessMin", "ThicknessMax",
"Plate_Thickness"), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
library(dplyr)
acceptable_vals <- lapply(1:nrow(df), function(x) Plate_Thickness[between(Plate_Thickness, df$ThicknessMin[x], df$ThicknessMax[x])])
set.seed(1)
df$Plate_Thickness <- sapply(acceptable_vals, function(x) x[sample(1:length(x), 1)])
Product ThicknessMin ThicknessMax Plate_Thickness
1: P0001 0.00 8 5.8
2: P0002 31.01 70 32.5
3: P0003 8.01 31 27.1
4: P0004 70.01 999 120.4
5: P0005 8.01 31 25.1
#DATA
df = structure(list(Product = c("P0001", "P0002", "P0003", "P0004",
"P0005"), ThicknessMin = c(0, 31.01, 8.01, 70.01, 8.01), ThicknessMax = c(8L,
70L, 31L, 999L, 31L)), .Names = c("Product", "ThicknessMin",
"ThicknessMax"), class = c("data.table", "data.frame"), row.names = c(NA,
-5L))
Plate_Thickness = c(5.8,25.1,27.1,32.5,55.6,98.1,120.4)
set.seed(1)
apply(X = df[c("ThicknessMin", "ThicknessMax")],
MARGIN = 1, #Run FUN on each row of X
FUN = function(x)
#Retain only eligible values for each row and sample 1 value
sample(x = Plate_Thickness[Plate_Thickness > x[1] & Plate_Thickness < x[2]],
size = 1))
#[1] 2.0 32.5 27.1 120.4 25.1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.