简体   繁体   English

需要格式化R数据

[英]Need to format R data

This is a follow up to my only other question, but hopefully more direct. 这是我唯一的其他问题的跟进,但希望可以直接解决。 I need data that looks like this: 我需要看起来像这样的数据:

     custID   custChannel            custDate
1     151        Direct 2015-10-10 00:15:32
2     151    GooglePaid 2015-10-10 00:16:45
3     151     Converted 2015-10-10 00:17:01
4    5655      BingPaid 2015-10-11 00:20:12
5    7855 GoogleOrganic 2015-10-12 00:05:32
6    7862  YahooOrganic 2015-10-13 00:18:20
7    9655    GooglePaid 2015-10-13 00:08:35
8    9655    GooglePaid 2015-10-13 00:11:11
9    9655     Converted 2015-10-13 00:11:35
10   9888    GooglePaid 2015-10-14 00:08:35
11   9888    GooglePaid 2015-10-14 00:11:11
12   9888     Converted 2015-10-14 00:11:35

To be sorted so that the output looks like this: 要排序,以使输出看起来像这样:

  Path                                 Path Count
BingPaid                                   1
Direct>GooglePaid>Converted                1
GoogleOrganic                              1
GooglePaid>GooglePaid>Converted            2
YahooOrganic                               1

The idea is to capture customer paths (as identified by custID) and count for the entire data set how many people took that exact path (Path Count). 这个想法是捕获客户路径(由custID标识),并对整个数据集计数采用该确切路径的人数(路径计数)。 I need to perform this over a data set of 5 million rows. 我需要对500万行的数据集执行此操作。

Using data.table you can do this as follows: 使用data.table可以执行以下操作:

require(data.table)
setDT(dat)[,paste(custChannel, collapse = ">"), custID][,.("path length"=.N), .(path=V1)]

Result: 结果:

                              path path length
1:     Direct>GooglePaid>Converted           1
2:                        BingPaid           1
3:                   GoogleOrganic           1
4:                    YahooOrganic           1
5: GooglePaid>GooglePaid>Converted           2

Step by step: 一步步:

setDT(dat) # make dat a data.table
# get path by custID
dat_path <- dat[,paste(custChannel, collapse = ">"), custID] 
#get length by path created in the previous step
res <- dat_path[,.("path length"=.N), by=.(path=V1)] 

Have a look at dat_path and res to understand what happened. 查看dat_pathres以了解发生了什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM