简体   繁体   English

r使用dplyr的“收集”功能

[英]r using dplyr 'gather' function

I have a dataframe that looks like the picture showned below in 'input'. 我有一个数据框,看起来像下面的“输入”中所示的图片。

I try to get 1 date per row (see picture below in 'desired output'). 我尝试每行获取1个日期(请参见下面的“所需输出”中的图片)。 In other word, I try to do a kind of 'transpose' for each row. 换句话说,我尝试对每一行进行一种“转置”。

Let's stipulate that the combination 'LC' and 'Prod' is a unique key. 让我们规定“ LC”和“ Prod”的组合是唯一的键。

Input 输入

在此处输入图片说明

Desired output: 所需的输出:

在此处输入图片说明

Info: 信息:

In my real dataset, there is some missing values in the quantity field (the colored region area). 在我的真实数据集中,数量字段(彩色区域)中缺少一些值。 Thus, I should still be able to compute with missing values. 因此,我仍然应该能够使用缺失值进行计算。

My try that fails 我的尝试失败了

I have tried the following but it fails... 我尝试了以下操作,但失败了...

library("dplyr")
outputTest <- tbl_df(inputTest) %>%
  gather(date, value, c(inputTest$LC, inputTest$Prod))

outputTest

Source: 资源:

inputTest <- structure(list(LC = structure(c(1L, 3L, 1L, 2L), .Label = c("berlin", 
                                                            "munchen", "stutgart"), class = "factor"), Prod = structure(c(1L, 
                                                                                                                          2L, 2L, 1L), .Label = c("(STORE1)400096", "STORE2_00154"), class = "factor"), 
               PROD_TYPE = structure(c(1L, 2L, 2L, 1L), .Label = c("STORE1", 
                                                                   "STORE2"), class = "factor"), X2015.6.29 = c(20.08, 8.91, 
                                                                                                                11.38, 15.42), X2015.7.6 = c(20.66, 8.49, 10.91, 15.57), 
               X2015.7.13 = c(19.02, 8.55, 10.89, 14.6), X2015.7.20 = c(18.6, 
                                                                        7.95, 10.58, 14.31)), .Names = c("LC", "Prod", "PROD_TYPE", 
                                                                                                         "2015.6.29", "2015.7.6", "2015.7.13", "2015.7.20"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                     -4L))

Using gather, you can specify the columns you do not want to gather with the negation operator '-' (minus sign). 使用gather,您可以使用否定运算符“-”(减号)指定不想收集的列。 The key in your case is the date, the value is the value, and LC, Prod, and PROD_TYPE serve as identifiers. 您的情况下的关键是日期,值是值,并且LC,Prod和PROD_TYPE用作标识符。

output <- as.data.frame(inputTest) %>%
        tidyr::gather(key = Date, value = Value, -LC, -Prod, -PROD_TYPE)

This yields: 这样产生:

         LC           Prod PROD_TYPE      Date Value
1    berlin (STORE1)400096    STORE1 2015.6.29 20.08
2  stutgart   STORE2_00154    STORE2 2015.6.29  8.91
3    berlin   STORE2_00154    STORE2 2015.6.29 11.38
4   munchen (STORE1)400096    STORE1 2015.6.29 15.42
5    berlin (STORE1)400096    STORE1  2015.7.6 20.66
6  stutgart   STORE2_00154    STORE2  2015.7.6  8.49
7    berlin   STORE2_00154    STORE2  2015.7.6 10.91
8   munchen (STORE1)400096    STORE1  2015.7.6 15.57
9    berlin (STORE1)400096    STORE1 2015.7.13 19.02
10 stutgart   STORE2_00154    STORE2 2015.7.13  8.55
11   berlin   STORE2_00154    STORE2 2015.7.13 10.89
12  munchen (STORE1)400096    STORE1 2015.7.13 14.60
13   berlin (STORE1)400096    STORE1 2015.7.20 18.60
14 stutgart   STORE2_00154    STORE2 2015.7.20  7.95
15   berlin   STORE2_00154    STORE2 2015.7.20 10.58
16  munchen (STORE1)400096    STORE1 2015.7.20 14.31

It is better to have column names that starts as non-numeric. 最好使列名以非数字开头。 According to ?gather , the ... specifies for selection of columns by using its name. 根据?gather...指定使用其名称来选择列。 Here, we are interested in the columns that starts with number ie the date columns, so we can use matches and specify a regex to select those columns 在这里,我们对以数字开头的列(即日期列)感兴趣,因此我们可以使用matches并指定正则表达式来选择这些列

library(dplyr)
library(tidyr)
inputTest %>%
       tbl_df %>% 
       gather(date, value, matches("^\\d+") )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM