简体   繁体   English

在R中分组和分割数据帧

[英]grouping and splitting data frame in R

The following is the promotion sales table listing products and group where the promotion was run and at what time. 以下是促销销售表,其中列出了促销在何处以及何时运行的产品和组。

   Product.code  cgrp promo.from   promo.to
1    1100001369    12 2014-01-01 2014-03-01
2    1100001369 16 37 2014-01-01 2014-03-01
3    1100001448    12 2014-03-01 2014-03-01
4    1100001446    12 2014-03-01 2014-03-01
5    1100001629 11 30 2014-03-01 2014-03-01
6    1100001369 16 37 2014-03-01 2014-06-01
7    1100001368    12 2014-06-01 2014-07-01
8    1100001369    12 2014-06-01 2014-07-01
9    1100001368 11 30 2014-06-01 2014-07-01
10   1100001738 11 30 2014-06-01 2014-07-01
11   1100001629 11 30 2014-06-01 2014-06-01
12   1100001738 11 30 2014-07-01 2014-07-01
13   1100001619 11 30 2014-08-01 2014-08-01
14   1100001619 11 30 2014-08-01 2014-08-01
15   1100001629 11 30 2014-08-01 2014-08-01
16   1100001738    12 2014-09-01 2014-09-01
17   1100001738 16 37 2014-08-01 2014-08-01
18   1100001448    12 2014-09-01 2014-09-01
19   1100001446    12 2014-10-01 2014-10-01
20   1100001369    12 2014-11-01 2014-11-01
21   1100001547 16 37 2014-11-01 2014-11-01
22   1100001368 11 30 2014-11-01 2014-11-01

I am trying to group the product.code and cgrp so that I can know all promotion for a product in a particular group and do further analysis. 我正在尝试将product.code和cgrp分组,以便我可以了解特定组中某个产品的所有促销情况并进行进一步的分析。

I tried looping through the whole data.frame. 我尝试遍历整个data.frame。 Not efficient and buggy. 没有效率和越野车。

What is the efficient method to get this done. 什么是完成此任务的有效方法。

[edit] to get a multiple data.frame like the following [edit]获取多个data.frame,如下所示

x= x =

   Product.code  cgrp promo.from   promo.to
3    1100001448    12 2014-03-01 2014-03-01
18   1100001448    12 2014-09-01 2014-09-01

y= y =

   Product.code  cgrp promo.from   promo.to
1    1100001369    12 2014-01-01 2014-03-01
8    1100001369    12 2014-06-01 2014-07-01
20   1100001369    12 2014-11-01 2014-11-01

You could split the 'cgrp' column and reshape the dataset to 'long' format with cSplit . 您可以split 'cgrp'列,然后使用cSplit将数据集cSplit为'long'格式。 Then, split the dataset ('df1') by 'Product.code' and 'cgrp to create a list ('lst'). 然后,用“ Product.code”和“ cgrp” split数据集(“ df1”)以创建list (“ lst”)。

 library(splitstackshape)
 df1 <- as.data.frame(cSplit(df, 'cgrp', ' ', 'long'))
 lst <- split(df1, list(df1$Product.code, df1$cgrp), drop=TRUE)
 names(lst) <- paste0('dfN', seq_along(lst))

It may be better to keep the datasets in a list . 将数据集保留在list可能会更好。 But, if you want as separate objects in the global environment, one option is list2env (not recommended). 但是,如果要在全局环境中作为单独的对象,则一个选项是list2env (不建议)。

 list2env(lst, envir=.GlobalEnv)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM