[英]Dcast multiple observation in a 1 cell in R
我有一個R數據框
Customer Month BaseVolume IncrementalVolume TradeSpend
10 Jan 11 1 110
10 Feb 12 2 120
20 Jan 21 7 210
20 Feb 22 8 220
我想這樣轉換它,
Customer Jan Feb
10 BaseVolume 11 BaseVolume 12
IncrementalVolume 1 IncrementalVolume 2
TradeSpend 110 TradeSpend 120
20 BaseVolume 21 BaseVolume 22
IncrementalVolume 7 IncrementalVolume 8
TradeSpend 210 TradeSpend 220
我嘗試了dcast(重塑),但無法獲得此結果。 請幫幫我
您可以嘗試以下操作(就您而言,您的數據是df1,您需要在我提到的任何操作之前執行setDT(df1)
):
library(data.table)
dt1 <- structure(list(Customer = c(10L, 10L, 20L, 20L), Month = c("Jan",
"Feb", "Jan", "Feb"), BaseVolume = c(11L, 12L, 21L, 22L), IncrementalVolume = c(1L,
2L, 7L, 8L), TradeSpend = c(110L, 120L, 210L, 220L)), .Names = c("Customer",
"Month", "BaseVolume", "IncrementalVolume", "TradeSpend"), row.names = c(NA,
-4L), class = c("data.table", "data.frame"))
res <- dcast(melt(dt1, id.vars = c("Customer", "Month")), Customer + variable~ Month)
> res
Customer variable Feb Jan
1: 10 BaseVolume 12 11
2: 10 IncrementalVolume 2 1
3: 10 TradeSpend 120 110
4: 20 BaseVolume 22 21
5: 20 IncrementalVolume 8 7
6: 20 TradeSpend 220 210
如果您希望他們在一起,可以執行以下操作:
update_cols <- which(!names(res) %in% c("Customer", "variable"))
res[, (update_cols):= lapply(.SD, function(x) paste(variable, x)), .SDcols = update_cols][, variable:= NULL]
這使:
> res
Customer Feb Jan
1: 10 BaseVolume 12 BaseVolume 11
2: 10 IncrementalVolume 2 IncrementalVolume 1
3: 10 TradeSpend 120 TradeSpend 110
4: 20 BaseVolume 22 BaseVolume 21
5: 20 IncrementalVolume 8 IncrementalVolume 7
6: 20 TradeSpend 220 TradeSpend 210
盡管已經有了答案 ,但我認為它可以在某些方面進行改進以更接近預期的輸出:
Jan
, Feb
的順序顯示月份 dcast()
之前進行 我們將從將輸入數據的格式從寬格式重整為長格式開始,但要確保“ Month
將以正確的順序顯示:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
# turn Month into factor with levels in the given order
molten[, Month := forcats::fct_inorder(Month)]
現在,在調用dcast()
之前 ,將以長格式創建一個新的text
列:
molten[, text := paste(variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
結果類似於此答案,但月份按預期順序排列。
注意:不幸的是,折疊每個Customer
行的方法也不起作用,因為在打印時不遵守換行符:
dcast(molten, Customer ~ Month, value.var = "text", paste0, collapse = "\n")
# Customer Jan Feb
#1: 10 BaseVolume 11\nIncrementalVolume 1\nTradeSpend 110 BaseVolume 12\nIncrementalVolume 2\nTradeSpend 120
#2: 20 BaseVolume 21\nIncrementalVolume 7\nTradeSpend 210 BaseVolume 22\nIncrementalVolume 8\nTradeSpend 220
可以通過在右側填充空白來使text
列左對齊(最小長度由最長字符串的字符長度確定):
molten[, text := paste(variable, value)]
molten[, text := stringr::str_pad(text, max(nchar(text)), "right")]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
或者, text
列本身可以對齊:
fmt <- stringr::str_interp("%-${n}s %3i", list(n = molten[, max(nchar(levels(variable)))]))
molten[, text := sprintf(fmt, variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
此處,還將通過使用字符串插值來動態創建sprintf()
中使用的格式:
fmt
#[1] "%-17s %3i"
請注意,此處使用最長級別的variable
的字符長度,因為默認情況下melt()
已將variable
為分解因子。
答案可能更簡單,因為最新版本的data.table
允許同時重塑多個列:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
molten[, Month := forcats::fct_inorder(Month)]
dcast(molten, Customer + variable ~ Month, value.var = c("variable", "value"))
# Customer variable variable.1_Jan variable.1_Feb value_Jan value_Feb
#1: 10 BaseVolume BaseVolume BaseVolume 11 12
#2: 10 IncrementalVolume IncrementalVolume IncrementalVolume 1 2
#3: 10 TradeSpend TradeSpend TradeSpend 110 120
#4: 20 BaseVolume BaseVolume BaseVolume 21 22
#5: 20 IncrementalVolume IncrementalVolume IncrementalVolume 7 8
#6: 20 TradeSpend TradeSpend TradeSpend 210 220
但不幸的是,它缺少一種選項來輕松地以交替順序對列進行重新排序,即所有屬於Jan
,然后Feb
等的列。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.