简体   繁体   English

使用data.table进行字符串分组(聚合)(R 3.1.1)

[英]String grouping (aggregation) with data.table (R 3.1.1)

Input: I have this data: 输入:我有这些数据:

library(data.table)
ids <- c(10, 10, 10, 11, 12, 12)
items <- c('soup', 'rice', 'lemon', 'chicken', 'lamb', 'noodles')
orders <- as.data.table(list(id=ids, item=items))

> orders
   id    item
1: 10    soup
2: 10    rice
3: 10   lemon
4: 11 chicken
5: 12    lamb
6: 12 noodles

Goal: Need to arrive at this (group all items by their id): 目标:需要达到此目的(按ID分组所有项目):

   id        items
1: 10    soup,rice,lemon
2: 11    chicken
3: 12    lamb,noodles

What I did: I am using data.table on R 3.1.1 (latest release) - tried the below method, which should work: 我做了什么:我在R 3.1.1(最新版本)上使用data.table - 尝试了下面的方法,它应该工作:

orders[,list(items=list(item)), by=id]

But getting the below (incorrect) output: 但得到以下(不正确)输出:

   id       items
1: 10 lamb,noodles,lemon
2: 11 lamb,noodles,lemon
3: 12 lamb,noodles,lemon    

What am I doing wrong, and what is the right way to group strings correctly with data.table? 我做错了什么,用data.table正确分组字符串的正确方法是什么?

orders[, paste(item, collapse = ","), by = id]

##    id              V1
## 1: 10 soup,rice,lemon
## 2: 11         chicken
## 3: 12    lamb,noodles

The syntax for what it sounds like you're looking for is a little bit awkward, but makes sense when you think about how you would normally use list . 听起来像你正在寻找的语法有点尴尬,但是当你想到你通常如何使用list时,这是有道理的。

Try the following: 请尝试以下方法:

orders[, list(item = list(item)), by = "id"]
#    id            item
# 1: 10 soup,rice,lemon
# 2: 11         chicken
# 3: 12    lamb,noodles
str(.Last.value)
# Classes ‘data.table’ and 'data.frame':  3 obs. of  2 variables:
#  $ id  : num  10 11 12
#  $ item:List of 3
#   ..$ : chr  "soup" "rice" "lemon"
#   ..$ : chr "chicken"
#   ..$ : chr  "lamb" "noodles"
#  - attr(*, ".internal.selfref")=<externalptr> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM