r使用dplyr的“收集”功能

Question

I have a dataframe that looks like the picture showned below in 'input'. 我有一个数据框，看起来像下面的“输入”中所示的图片。

I try to get 1 date per row (see picture below in 'desired output'). 我尝试每行获取1个日期（请参见下面的“所需输出”中的图片）。 In other word, I try to do a kind of 'transpose' for each row. 换句话说，我尝试对每一行进行一种“转置”。

Let's stipulate that the combination 'LC' and 'Prod' is a unique key. 让我们规定“ LC”和“ Prod”的组合是唯一的键。

Input 输入

Desired output: 所需的输出：

Info: 信息：

In my real dataset, there is some missing values in the quantity field (the colored region area). 在我的真实数据集中，数量字段（彩色区域）中缺少一些值。 Thus, I should still be able to compute with missing values. 因此，我仍然应该能够使用缺失值进行计算。

My try that fails 我的尝试失败了

I have tried the following but it fails... 我尝试了以下操作，但失败了...

library("dplyr")
outputTest <- tbl_df(inputTest) %>%
  gather(date, value, c(inputTest$LC, inputTest$Prod))

outputTest

Source: 资源：

inputTest <- structure(list(LC = structure(c(1L, 3L, 1L, 2L), .Label = c("berlin", 
                                                            "munchen", "stutgart"), class = "factor"), Prod = structure(c(1L, 
                                                                                                                          2L, 2L, 1L), .Label = c("(STORE1)400096", "STORE2_00154"), class = "factor"), 
               PROD_TYPE = structure(c(1L, 2L, 2L, 1L), .Label = c("STORE1", 
                                                                   "STORE2"), class = "factor"), X2015.6.29 = c(20.08, 8.91, 
                                                                                                                11.38, 15.42), X2015.7.6 = c(20.66, 8.49, 10.91, 15.57), 
               X2015.7.13 = c(19.02, 8.55, 10.89, 14.6), X2015.7.20 = c(18.6, 
                                                                        7.95, 10.58, 14.31)), .Names = c("LC", "Prod", "PROD_TYPE", 
                                                                                                         "2015.6.29", "2015.7.6", "2015.7.13", "2015.7.20"), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                     -4L))

Answer 1

Using gather, you can specify the columns you do not want to gather with the negation operator '-' (minus sign). 使用gather，您可以使用否定运算符“-”（减号）指定不想收集的列。 The key in your case is the date, the value is the value, and LC, Prod, and PROD_TYPE serve as identifiers. 您的情况下的关键是日期，值是值，并且LC，Prod和PROD_TYPE用作标识符。

output <- as.data.frame(inputTest) %>%
        tidyr::gather(key = Date, value = Value, -LC, -Prod, -PROD_TYPE)

This yields: 这样产生：

         LC           Prod PROD_TYPE      Date Value
1    berlin (STORE1)400096    STORE1 2015.6.29 20.08
2  stutgart   STORE2_00154    STORE2 2015.6.29  8.91
3    berlin   STORE2_00154    STORE2 2015.6.29 11.38
4   munchen (STORE1)400096    STORE1 2015.6.29 15.42
5    berlin (STORE1)400096    STORE1  2015.7.6 20.66
6  stutgart   STORE2_00154    STORE2  2015.7.6  8.49
7    berlin   STORE2_00154    STORE2  2015.7.6 10.91
8   munchen (STORE1)400096    STORE1  2015.7.6 15.57
9    berlin (STORE1)400096    STORE1 2015.7.13 19.02
10 stutgart   STORE2_00154    STORE2 2015.7.13  8.55
11   berlin   STORE2_00154    STORE2 2015.7.13 10.89
12  munchen (STORE1)400096    STORE1 2015.7.13 14.60
13   berlin (STORE1)400096    STORE1 2015.7.20 18.60
14 stutgart   STORE2_00154    STORE2 2015.7.20  7.95
15   berlin   STORE2_00154    STORE2 2015.7.20 10.58
16  munchen (STORE1)400096    STORE1 2015.7.20 14.31

Answer 2

It is better to have column names that starts as non-numeric. 最好使列名以非数字开头。 According to ?gather , the ... specifies for selection of columns by using its name. 根据?gather ， ...指定使用其名称来选择列。 Here, we are interested in the columns that starts with number ie the date columns, so we can use matches and specify a regex to select those columns 在这里，我们对以数字开头的列（即日期列）感兴趣，因此我们可以使用matches并指定正则表达式来选择这些列

library(dplyr)
library(tidyr)
inputTest %>%
       tbl_df %>% 
       gather(date, value, matches("^\\d+") )

r使用dplyr的“收集”功能

问题描述

2 个解决方案

解决方案1
4 2018-01-03 17:06:04

解决方案2
1 2018-01-03 17:00:06

r使用dplyr的“收集”功能

问题描述

2 个解决方案

解决方案1 4 2018-01-03 17:06:04

解决方案2 1 2018-01-03 17:00:06

解决方案1
4 2018-01-03 17:06:04

解决方案2
1 2018-01-03 17:00:06