简体   繁体   English

在R中的1个像元中进行Dcast多观察

[英]Dcast multiple observation in a 1 cell in R

I have one R dataframe 我有一个R数据框

Customer    Month   BaseVolume  IncrementalVolume   TradeSpend
10          Jan         11            1             110
10          Feb         12            2             120
20          Jan         21            7             210
20          Feb         22            8             220

Which I want to convert it like this, 我想这样转换它,

Customer    Jan                    Feb  
10          BaseVolume 11         BaseVolume 12
            IncrementalVolume 1   IncrementalVolume 2 
            TradeSpend 110        TradeSpend 120

20         BaseVolume 21          BaseVolume 22
           IncrementalVolume 7    IncrementalVolume 8 
           TradeSpend 210         TradeSpend 220     

I tried dcast (reshape) but I couldn't get this result. 我尝试了dcast(重塑),但无法获得此结果。 Please help me out 请帮帮我

What you could try is the following (In your case say your data is df1 you need to do setDT(df1) before any of the actions that I mention): 您可以尝试以下操作(就您而言,您的数据是df1,您需要在我提到的任何操作之前执行setDT(df1) ):

library(data.table)
dt1 <- structure(list(Customer = c(10L, 10L, 20L, 20L), Month = c("Jan", 
"Feb", "Jan", "Feb"), BaseVolume = c(11L, 12L, 21L, 22L), IncrementalVolume = c(1L, 
2L, 7L, 8L), TradeSpend = c(110L, 120L, 210L, 220L)), .Names = c("Customer", 
"Month", "BaseVolume", "IncrementalVolume", "TradeSpend"), row.names = c(NA, 
-4L), class = c("data.table", "data.frame"))

res <- dcast(melt(dt1, id.vars = c("Customer", "Month")), Customer + variable~ Month)

> res
   Customer          variable Feb Jan
1:       10        BaseVolume  12  11
2:       10 IncrementalVolume   2   1
3:       10        TradeSpend 120 110
4:       20        BaseVolume  22  21
5:       20 IncrementalVolume   8   7
6:       20        TradeSpend 220 210

In case you want them together you can do the following: 如果您希望他们在一起,可以执行以下操作:

update_cols <- which(!names(res) %in% c("Customer", "variable"))
res[, (update_cols):= lapply(.SD, function(x) paste(variable, x)), .SDcols = update_cols][, variable:= NULL]

Which gives: 这使:

> res
     Customer            Feb                 Jan
1:       10       BaseVolume 12       BaseVolume 11
2:       10 IncrementalVolume 2 IncrementalVolume 1
3:       10      TradeSpend 120      TradeSpend 110
4:       20       BaseVolume 22       BaseVolume 21
5:       20 IncrementalVolume 8 IncrementalVolume 7
6:       20      TradeSpend 220      TradeSpend 210

Although there is already an answer , I feel it can improved in some respect to come closer to the expected output: 尽管已经有了答案 ,但我认为它可以在某些方面进行改进以更接近预期的输出:

  • the OP has specified the month to appear in the order Jan , Feb OP已指定按月份JanFeb的顺序显示月份
  • the output is difficult to read 输出难以阅读
  • munging of columns should take place before the dcast() 列的调整应在dcast() 之前进行

We'll start with reshaping the input data from wide to long format but make sure that Month will appear in the correct order: 我们将从将输入数据的格式从宽格式重整为长格式开始,但要确保“ Month将以正确的顺序显示:

molten <- melt(dt1, id.vars = c("Customer", "Month"))
# turn Month into factor with levels in the given order
molten[, Month := forcats::fct_inorder(Month)]

Now, a new text column is created in long format before the call to dcast() : 现在,在调用dcast() 之前 ,将以长格式创建一个新的text列:

molten[, text := paste(variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
#   Customer                 Jan                 Feb
#1:       10       BaseVolume 11       BaseVolume 12
#2:       10 IncrementalVolume 1 IncrementalVolume 2
#3:       10      TradeSpend 110      TradeSpend 120
#4:       20       BaseVolume 21       BaseVolume 22
#5:       20 IncrementalVolume 7 IncrementalVolume 8
#6:       20      TradeSpend 210      TradeSpend 220

The result is similar to this answer but has the months in the expected order. 结果类似于此答案,但月份按预期顺序排列。


NB Unfortunately, the approach to collapse also the rows per Customer doesn't work as line breaks aren't respected when printed: 注意:不幸的是,折叠每个Customer行的方法也不起作用,因为在打印时不遵守换行符:

dcast(molten, Customer ~ Month, value.var = "text", paste0, collapse = "\n")
#   Customer                                                Jan                                                Feb
#1:       10 BaseVolume 11\nIncrementalVolume 1\nTradeSpend 110 BaseVolume 12\nIncrementalVolume 2\nTradeSpend 120
#2:       20 BaseVolume 21\nIncrementalVolume 7\nTradeSpend 210 BaseVolume 22\nIncrementalVolume 8\nTradeSpend 220

The text column can be left aligned by padding white space to the right (the minimum lengths is determined by the character length of the longest string): 可以通过在右侧填充空白来使text列左对齐(最小长度由最长字符串的字符长度确定):

molten[, text := paste(variable, value)]
molten[, text := stringr::str_pad(text, max(nchar(text)), "right")]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
#   Customer                 Jan                 Feb
#1:       10 BaseVolume 11       BaseVolume 12      
#2:       10 IncrementalVolume 1 IncrementalVolume 2
#3:       10 TradeSpend 110      TradeSpend 120     
#4:       20 BaseVolume 21       BaseVolume 22      
#5:       20 IncrementalVolume 7 IncrementalVolume 8
#6:       20 TradeSpend 210      TradeSpend 220     

Or, the text column can be aligned in itself: 或者, text列本身可以对齐:

fmt <- stringr::str_interp("%-${n}s %3i", list(n = molten[, max(nchar(levels(variable)))]))
molten[, text := sprintf(fmt, variable, value)]
dcast(molten, Customer + variable ~ Month, value.var = "text")[, variable := NULL][]
#   Customer                   Jan                   Feb
#1:       10 BaseVolume         11 BaseVolume         12
#2:       10 IncrementalVolume   1 IncrementalVolume   2
#3:       10 TradeSpend        110 TradeSpend        120
#4:       20 BaseVolume         21 BaseVolume         22
#5:       20 IncrementalVolume   7 IncrementalVolume   8
#6:       20 TradeSpend        210 TradeSpend        220

Here, the format to be used in sprintf() is also created dynamically by using string interpolation: 此处,还将通过使用字符串插值来动态创建sprintf()中使用的格式:

fmt
#[1] "%-17s %3i"

Note that the character length of the longest level of variable is used here as melt() has turned variable to factor by default. 请注意,此处使用最长级别variable的字符长度,因为默认情况下melt()已将variable为分解因子。


The answer could have been much simpler as the latest versions of data.table allow to reshape multiple columns simultaneously: 答案可能更简单,因为最新版本的data.table允许同时重塑多个列:

molten <- melt(dt1, id.vars = c("Customer", "Month"))
molten[, Month := forcats::fct_inorder(Month)]
dcast(molten, Customer + variable ~ Month, value.var = c("variable", "value"))
#   Customer          variable    variable.1_Jan    variable.1_Feb value_Jan value_Feb
#1:       10        BaseVolume        BaseVolume        BaseVolume        11        12
#2:       10 IncrementalVolume IncrementalVolume IncrementalVolume         1         2
#3:       10        TradeSpend        TradeSpend        TradeSpend       110       120
#4:       20        BaseVolume        BaseVolume        BaseVolume        21        22
#5:       20 IncrementalVolume IncrementalVolume IncrementalVolume         7         8
#6:       20        TradeSpend        TradeSpend        TradeSpend       210       220

but unfortunately it is lacking an option to easily reorder the columns in alternating order, ie, all columns belonging to Jan , then Feb etc. 但不幸的是,它缺少一种选项来轻松地以交替顺序对列进行重新排序,即所有属于Jan ,然后Feb等的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM