[英]Update column values randomly based on value in other column in R
我想添加一個新列SubCategory
,其值根據Category
列的值隨機填充。 以下是詳細信息:
Sub_Hair = c("Shampoo", "Conditioner", "Gel", "HairOil", "Dye")
Sub_Beauty = c("Face", "Eye", "Lips")
Sub_Nail= c("NailPolish", "NailPolishRemover", "NailArtKit", "ManiPadiKit")
Sub_Others = c("Electric", "NonElectric")
> product_data_1[1:10, c("Pcode", "Category", "MRP")]
Pcode Category MRP
1 16156L Beauty $8.88
2 16162M Others $21.27
3 16168M Others $2.98
4 16169E Nail $26.64
5 16207A Hair $6.38
6 17012B Beauty $33.03
7 17012C Beauty $20.58
8 17012F Beauty $36.29
9 17091A Nail $20.55
10 17107D Nail $28.20
我正在嘗試下面的代碼。 但是,每個類別只更新一個子類別的行。 例如,所有具有“美容”類別的行,子類別是“眼睛”,而不是從“面部、眼睛和嘴唇”中隨機選擇的值。 這是代碼和輸出:
product_data_1 = within(product_data_1, SubCategory[Category == "Beauty"] <- sample(Sub_Beauty, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Hair"] <- sample(Sub_Hair, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Nail"] <- sample(Sub_Nail, 1))
product_data_1 = within(product_data_1, SubCategory[Category == "Others"] <- sample(Sub_Others, 1))
> product_data_1[1:10, c("Pcode", "Category", "MRP", "SubCategory")]
Pcode Category MRP SubCategory
1 16156L Beauty $8.88 Eye
2 16162M Others $21.27 Electric
3 16168M Others $2.98 Electric
4 16169E Nail $26.64 NailPolish
5 16207A Hair $6.38 Gel
6 17012B Beauty $33.03 Eye
7 17012C Beauty $20.58 Eye
8 17012F Beauty $36.29 Eye
9 17091A Nail $20.55 NailPolish
10 17107D Nail $28.20 NailPolish
這是一個基本的 R 解決方案。 它使用 Hadley Wickham 在這篇JSS 文章中解釋的拆分/應用/組合策略。
我將把Sub_*
向量放在一個列表中, Sub_list
。 請注意, split
將按Category
對結果進行排序,因此列表Sub_list
也必須按順序排列向量。
Sub_list <- list(Sub_Beauty, Sub_Hair, Sub_Nail, Sub_Others)
sp <- split(product_data_1, product_data_1$Category)
set.seed(1234)
sp <- lapply(seq_along(sp), function(i){
sp[[i]]$SubCategory <- sample(Sub_list[[i]], nrow(sp[[i]]), replace = TRUE)
sp[[i]]
})
result <- do.call(rbind, sp)
result <- result[order(as.integer(row.names(result))), ]
result
# Pcode Category MRP SubCategory
#1 16156L Beauty $8.88 Eye
#2 16162M Others $21.27 NonElectric
#3 16168M Others $2.98 NonElectric
#4 16169E Nail $26.64 NailPolish
#5 16207A Hair $6.38 Shampoo
#6 17012B Beauty $33.03 Eye
#7 17012C Beauty $20.58 Face
#8 17012F Beauty $36.29 Lips
#9 17091A Nail $20.55 NailPolishRemover
#10 17107D Nail $28.20 ManiPadiKit
最后清理。
rm(Sub_list)
數據
product_data_1 <- read.table(text = "
Pcode Category MRP
1 16156L Beauty $8.88
2 16162M Others $21.27
3 16168M Others $2.98
4 16169E Nail $26.64
5 16207A Hair $6.38
6 17012B Beauty $33.03
7 17012C Beauty $20.58
8 17012F Beauty $36.29
9 17091A Nail $20.55
10 17107D Nail $28.20
", header = TRUE)
將您的子類別值放在像subcat_list <- list(Hair = Hair, Beauty = Beauty, Nail = Nail, Others = Others)
。 然后,您可以使用product_data_1$Category
對subcat_list
進行切片並sapply
對結果向量列表的每個元素調用sample
:
set.seed(323)
product_data_1$SubCategory <- sapply(subcat_list[product_data_1$Category], sample, 1)
您還可以嘗試使用dplyr
+ purrr
稍微不同的方法:
library(tidyverse)
product_data_1 %>%
mutate(SubCategory = map_chr(Category, ~ sample(subcat_list[[.]], 1)))
Pcode Category MRP SubCategory
1 16156L Beauty $8.88 Eye
2 16162M Others $21.27 Electric
3 16168M Others $2.98 Electric
4 16169E Nail $26.64 NailPolish
5 16207A Hair $6.38 Gel
6 17012B Beauty $33.03 Eye
7 17012C Beauty $20.58 Lips
8 17012F Beauty $36.29 Face
9 17091A Nail $20.55 ManiPadiKit
10 17107D Nail $28.20 NailArtKit
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.