[英]Dcast multiple observation in a 1 cell in R
I have one R dataframe 我有一个R数据框
Customer Month BaseVolume IncrementalVolume TradeSpend
10 Jan 11 1 110
10 Feb 12 2 120
20 Jan 21 7 210
20 Feb 22 8 220
Which I want to convert it like this, 我想这样转换它,
Customer Jan Feb
10 BaseVolume 11 BaseVolume 12
IncrementalVolume 1 IncrementalVolume 2
TradeSpend 110 TradeSpend 120
20 BaseVolume 21 BaseVolume 22
IncrementalVolume 7 IncrementalVolume 8
TradeSpend 210 TradeSpend 220
I tried dcast (reshape) but I couldn't get this result. 我尝试了dcast(重塑),但无法获得此结果。 Please help me out
请帮帮我
What you could try is the following (In your case say your data is df1 you need to do setDT(df1)
before any of the actions that I mention): 您可以尝试以下操作(就您而言,您的数据是df1,您需要在我提到的任何操作之前执行
setDT(df1)
):
library(data.table)
dt1 <- structure(list(Customer = c(10L, 10L, 20L, 20L), Month = c("Jan",
"Feb", "Jan", "Feb"), BaseVolume = c(11L, 12L, 21L, 22L), IncrementalVolume = c(1L,
2L, 7L, 8L), TradeSpend = c(110L, 120L, 210L, 220L)), .Names = c("Customer",
"Month", "BaseVolume", "IncrementalVolume", "TradeSpend"), row.names = c(NA,
-4L), class = c("data.table", "data.frame"))
res <- dcast(melt(dt1, id.vars = c("Customer", "Month")), Customer + variable~ Month)
> res
Customer variable Feb Jan
1: 10 BaseVolume 12 11
2: 10 IncrementalVolume 2 1
3: 10 TradeSpend 120 110
4: 20 BaseVolume 22 21
5: 20 IncrementalVolume 8 7
6: 20 TradeSpend 220 210
In case you want them together you can do the following: 如果您希望他们在一起,可以执行以下操作:
update_cols <- which(!names(res) %in% c("Customer", "variable"))
res[, (update_cols):= lapply(.SD, function(x) paste(variable, x)), .SDcols = update_cols][, variable:= NULL]
Which gives: 这使:
> res
Customer Feb Jan
1: 10 BaseVolume 12 BaseVolume 11
2: 10 IncrementalVolume 2 IncrementalVolume 1
3: 10 TradeSpend 120 TradeSpend 110
4: 20 BaseVolume 22 BaseVolume 21
5: 20 IncrementalVolume 8 IncrementalVolume 7
6: 20 TradeSpend 220 TradeSpend 210
Although there is already an answer , I feel it can improved in some respect to come closer to the expected output: 尽管已经有了答案 ,但我认为它可以在某些方面进行改进以更接近预期的输出:
Jan
, Feb
Jan
, Feb
的顺序显示月份 dcast()
dcast()
之前进行 We'll start with reshaping the input data from wide to long format but make sure that Month
will appear in the correct order: 我们将从将输入数据的格式从宽格式重整为长格式开始,但要确保“
Month
将以正确的顺序显示:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
# turn Month into factor with levels in the given order
molten[, Month := forcats::fct_inorder(Month)]
Now, a new text
column is created in long format before the call to dcast()
: 现在,在调用
dcast()
之前 ,将以长格式创建一个新的text
列:
molten[, text := paste(variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
The result is similar to this answer but has the months in the expected order. 结果类似于此答案,但月份按预期顺序排列。
NB Unfortunately, the approach to collapse also the rows per Customer
doesn't work as line breaks aren't respected when printed: 注意:不幸的是,折叠每个
Customer
行的方法也不起作用,因为在打印时不遵守换行符:
dcast(molten, Customer ~ Month, value.var = "text", paste0, collapse = "\n")
# Customer Jan Feb
#1: 10 BaseVolume 11\nIncrementalVolume 1\nTradeSpend 110 BaseVolume 12\nIncrementalVolume 2\nTradeSpend 120
#2: 20 BaseVolume 21\nIncrementalVolume 7\nTradeSpend 210 BaseVolume 22\nIncrementalVolume 8\nTradeSpend 220
The text
column can be left aligned by padding white space to the right (the minimum lengths is determined by the character length of the longest string): 可以通过在右侧填充空白来使
text
列左对齐(最小长度由最长字符串的字符长度确定):
molten[, text := paste(variable, value)]
molten[, text := stringr::str_pad(text, max(nchar(text)), "right")]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
Or, the text
column can be aligned in itself: 或者,
text
列本身可以对齐:
fmt <- stringr::str_interp("%-${n}s %3i", list(n = molten[, max(nchar(levels(variable)))]))
molten[, text := sprintf(fmt, variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
# Customer Jan Feb
#1: 10 BaseVolume 11 BaseVolume 12
#2: 10 IncrementalVolume 1 IncrementalVolume 2
#3: 10 TradeSpend 110 TradeSpend 120
#4: 20 BaseVolume 21 BaseVolume 22
#5: 20 IncrementalVolume 7 IncrementalVolume 8
#6: 20 TradeSpend 210 TradeSpend 220
Here, the format to be used in sprintf()
is also created dynamically by using string interpolation: 此处,还将通过使用字符串插值来动态创建
sprintf()
中使用的格式:
fmt
#[1] "%-17s %3i"
Note that the character length of the longest level of variable
is used here as melt()
has turned variable
to factor by default. 请注意,此处使用最长级别的
variable
的字符长度,因为默认情况下melt()
已将variable
为分解因子。
The answer could have been much simpler as the latest versions of data.table
allow to reshape multiple columns simultaneously: 答案可能更简单,因为最新版本的
data.table
允许同时重塑多个列:
molten <- melt(dt1, id.vars = c("Customer", "Month"))
molten[, Month := forcats::fct_inorder(Month)]
dcast(molten, Customer + variable ~ Month, value.var = c("variable", "value"))
# Customer variable variable.1_Jan variable.1_Feb value_Jan value_Feb
#1: 10 BaseVolume BaseVolume BaseVolume 11 12
#2: 10 IncrementalVolume IncrementalVolume IncrementalVolume 1 2
#3: 10 TradeSpend TradeSpend TradeSpend 110 120
#4: 20 BaseVolume BaseVolume BaseVolume 21 22
#5: 20 IncrementalVolume IncrementalVolume IncrementalVolume 7 8
#6: 20 TradeSpend TradeSpend TradeSpend 210 220
but unfortunately it is lacking an option to easily reorder the columns in alternating order, ie, all columns belonging to Jan
, then Feb
etc. 但不幸的是,它缺少一种选项来轻松地以交替顺序对列进行重新排序,即所有属于
Jan
,然后Feb
等的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.