[英]How to recode a continuous variable into Ranges
我需要將連續變量重新編碼為類別,通常我使用“剪切”函數,但在剪切函數中我需要指定中斷。 我正在尋找一種方法,根據我的數據框中的其他分類變量,有一組不同的休息時間。
我的例子中的變量是Cost,“break”在第二個表“cost.range”中,我為每個“Region”和每個“Category”設置了一組不同的Breaks
示例:
Region Product Category Cost
Country A Product 1 CAT A 731
Country B Product 1 CAT A 659
Country C Product 1 CAT A 385
Country D Product 1 CAT A 763
Country A Product 2 CAT A 701
Country B Product 2 CAT A 759
Country C Product 2 CAT A 580
Country D Product 2 CAT A 147
Country A Product 3 CAT B 645
Country B Product 3 CAT B 657
Country C Product 3 CAT B 424
Region Category Cost.Range Range
Country A CAT A 10 R1
Country A CAT A 50 R2
Country A CAT A 200 R3
Country A CAT A 1000 R4
Country A CAT B 20 R1
Country A CAT B 100 R2
Country A CAT B 400 R3
Country A CAT B 1500 R4
用於生成示例的代碼:
Region <- c("Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D","Country A","Country B","Country C","Country D")
Product <- c("Product 1","Product 1","Product 1","Product 1","Product 2","Product 2","Product 2","Product 2","Product 3","Product 3","Product 3","Product 3","Product 4","Product 4","Product 4","Product 4")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B","CAT B")
Cost <- c(731,659,385,763,701,759,580,147,645,657,424,34,850,463,160,550)
Table1 <- data.frame(Region, Product, Category, Cost)
Region <- c("Country A","Country A","Country A","Country A","Country A","Country A","Country A","Country A")
Category <- c("CAT A","CAT A","CAT A","CAT A","CAT B","CAT B","CAT B","CAT B")
Cost.range <- c(10,50,200,1000,20,100,400,1500)
Range <- c("R1","R1","R3","R4","R1","R2","R3","R4")
Table2 <- data.frame(Region, Category, Cost.range, Range)
這不是最優雅的解決方案(我有興趣看到更好的方法),但它應該達到您正在尋找的結果。
dplyr
包中的select()
和distinct()
函數可以找到Region
和Category
的可能組合。 這些組合用於對兩個表進行子集化,並將cut()
函數應用於每個子集。
library('dplyr')
library('data.table')
dt1 <- data.table(Table1)
dt2 <- data.table(Table2)
t2d <- Table2 %>% select(Region, Category) %>% distinct
for(i in 1:nrow(t2d)){
dt2_range_subset <- dt2[Region == as.character(t2d$Region[i])
& Category == t2d$Category[i], Cost.range]
dt1[Region == as.character(t2d$Region[i]) & Category == t2d$Category[i],
Cost_factor := cut(as.matrix(Cost), dt2_range_subset)]
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.