如何将向量分组到向量列表中？

Question

I have some data which looks like this (fake data for example's sake):我有一些看起来像这样的数据（例如假数据）：

dressId        color 
6              yellow 
9              red
10             green 
10             purple 
10             yellow 
12             purple 
12             red

where color is a factor vector.其中颜色是因子向量。 It is not guaranteed that all possible levels of the factor actually appear in the data (eg the color "blue" could also be one of the levels).不能保证该因子的所有可能级别都实际出现在数据中（例如，颜色“蓝色”也可能是级别之一）。

I need a list of vectors which groups the available colors of each dress:我需要一个向量列表，将每件衣服的可用颜色分组：

[[1]]
yellow  

[[2]] 
red    

[[3]] 
green purple yellow 

[[4]] 
purple red

Preserving the IDs of the dresses would be nice (eg a dataframe where this list is the second column and the IDs are the first), but not necessary.保留连衣裙的 ID 会很好（例如，一个数据框，该列表是第二列，ID 是第一列），但不是必需的。

I wrote a loop which goes through the dataframe row for row, and while the next ID is the same, it adds the color to a vector.我写了一个循环遍历数据帧行，虽然下一个 ID 相同，但它将颜色添加到向量中。 (I am sure that the data is sorted by ID). （我确信数据是按 ID 排序的）。 When the ID in the first column changes, it adds the vector to a list:当第一列中的 ID 发生变化时，它会将向量添加到列表中：

result <- NULL 
while(blah blah) 
{
    some code which creates the vector called "colors" 
    result[[dressCounter]] <- colors 
    dressCounter <- dressCounter + 1
}

After wrestling with getting all the necessary counting variables correct, I found out to my dismay that it doesn't work.在努力使所有必要的计数变量都正确之后，我沮丧地发现它不起作用。 The first time, colors is第一次， colors是

[1] yellow
Levels: green yellow purple red blue

and it gets coerced into an integer, so result becomes 2 .它被强制转换为整数，因此result变为2 。

In the second loop repetition, colors only contains red, and result becomes a simple integer vector, [1] 2 4 .在第二次循环重复中， colors只包含红色， result变成了一个简单的整数向量[1] 2 4 。

In the third repetition, colors is a vector now,在第三次重复中， colors现在是一个向量，

[1] green  purple yellow
Levels: green yellow purple red blue

and I get我得到

result[[3]] <- colors

Error in result[[3]] <- colors :结果错误[[3]] <-颜色：
more elements supplied than there are to replace提供的元素多于替换的元素

What am I doing wrong?我究竟做错了什么？ Is there a way to initialize result so it doesn't get converted into a numeric vector, but becomes a list of vectors?有没有办法初始化result所以它不会被转换为数字向量，而是成为向量列表？

Also, is there another way to do the whole thing than "roll my own"?此外，除了“自己动手”之外，还有其他方法可以完成整个事情吗？

Answer 1

split.data.frame is a good way to organize this; split.data.frame是一个很好的组织方式； then extract the color component.然后提取颜色分量。

d <- data.frame(dressId=c(6,9,10,10,10,12,12),
               color=factor(c("yellow","red","green",
                              "purple","yellow",
                              "purple","red"),
                 levels=c("red","orange","yellow",
                          "green","blue","purple")))

I think the version you want is actually this:我认为你想要的版本实际上是这样的：

ss <- split.data.frame(d,d$dressId)

You can get something more like the list you requested by extracting the color component:通过提取颜色分量，您可以获得更像您请求的列表的内容：

lapply(ss,"[[","color")

Answer 2

In addition to split , you should consider aggregate .除了split ，您还应该考虑aggregate 。 Use c or I as the aggregation function to get your list column:使用c或I作为聚合函数来获取您的list列：

out <- aggregate(color ~ dressId, mydf, c)
out
#   dressId                 color
# 1       6                yellow
# 2       9                   red
# 3      10 green, purple, yellow
# 4      12           purple, red
str(out)
# 'data.frame': 4 obs. of  2 variables:
#  $ dressId: int  6 9 10 12
#  $ color  :List of 4
#   ..$ 0: chr "yellow"
#   ..$ 1: chr "red"
#   ..$ 2: chr  "green" "purple" "yellow"
#   ..$ 3: chr  "purple" "red"
out$color
# $`0`
# [1] "yellow"
# 
# $`1`
# [1] "red"
# 
# $`2`
# [1] "green"  "purple" "yellow"
# 
# $`3`
# [1] "purple" "red"

Note : This works even if the "color" variable is a factor , as in Ben's sample data (I missed that point when I posted the answer above) but you need to use I as the aggregation function instead of c :注意：即使“颜色”变量是一个factor ，这也有效，如 Ben 的示例数据（我在上面发布答案时错过了那个点）但您需要使用I作为聚合函数而不是c ：

out <- aggregate(color ~ dressId, d, I)
str(out)
# 'data.frame': 4 obs. of  2 variables:
#  $ dressId: num  6 9 10 12
#  $ color  :List of 4
#   ..$ 0: Factor w/ 6 levels "red","orange",..: 3
#   ..$ 1: Factor w/ 6 levels "red","orange",..: 1
#   ..$ 2: Factor w/ 6 levels "red","orange",..: 4 6 3
#   ..$ 3: Factor w/ 6 levels "red","orange",..: 6 1
out$color
# $`0`
# [1] yellow
# Levels: red orange yellow green blue purple
# 
# $`1`
# [1] red
# Levels: red orange yellow green blue purple
# 
# $`2`
# [1] green  purple yellow
# Levels: red orange yellow green blue purple
# 
# $`3`
# [1] purple red   
# Levels: red orange yellow green blue purple

Strangely, however, the default display shows the integer values:然而，奇怪的是，默认显示显示的是整数值：

out
#   dressId   color
# 1       6       3
# 2       9       1
# 3      10 4, 6, 3
# 4      12    6, 1

Answer 3

Assuming your data frame is saved in a variable called df , then you can use simply group_by and summarize with list function of dplyr package like this假设您的数据框保存在一个名为df的变量中，那么您可以简单地使用group_by并使用dplyr包的list函数进行summarize ，如下所示

library('dplyr')

df %>%
  group_by(dressId) %>%
  summarize(colors = list(color))

Applied to your example:应用于您的示例：

df <- tribble(
  ~dressId, ~color,
         6, 'yellow',
         9, 'red',
        10, 'green',
        10, 'purple',
        10, 'yellow',
        12, 'purple',
        12, 'red'
)

df %>%
  group_by(dressId) %>%
  summarize(colors = list(color))

# dressId                colors
#       6                yellow
#       9                   red
#      10 green, purple, yellow
#      12           purple, red

Answer 4

I am afraid that the answer should be a little different, you should use the following code to accomplish the requested behaviour恐怕答案应该有点不同，您应该使用以下代码来完成请求的行为

df %>%
group_by(dressId) %>%
summarize(colors = toString(unique(color)))

Answer 5

All the other answers do the job and I'm slightly late to the party, but some have used dplyr, and I always try to stay away from tidyverse if possible, and for this problem one can use the base R without tidyverse bloat.所有其他答案都可以完成这项工作，我参加聚会有点晚了，但有些人使用了 dplyr，如果可能的话，我总是尽量远离 tidyverse，对于这个问题，可以使用基本 R 而不会使 tidyverse 膨胀。 Some others have solved this through making a dataframe and that is not what the title says :)其他一些人通过制作数据框解决了这个问题，这不是标题所说的:)

let's create the vectors as OP didn't provide us the code (note that OP wants vector and not a dataframe although you can do this with a dataframe with a minor modification):让我们创建向量，因为 OP 没有向我们提供代码（请注意，OP 需要向量而不是数据帧，尽管您可以使用稍作修改的数据帧来执行此操作）：

dressId <- c(6, 9, 10, 10, 10, 12, 12)
color <- c("yellow", "red", "green", "purple", "yellow", "purple", "red")

Now let's get to the business and calculate what OP asked for:现在让我们开始业务并计算 OP 的要求：

I need a list of vectors which groups the available colors of each dress:我需要一个向量列表，将每件衣服的可用颜色分组：

result <- split(x = color, f = dressId)

result

which will output:这将输出：

 $`6` [1] "yellow" $`9` [1] "red" $`10` [1] "green" "purple" "yellow" $`12` [1] "purple" "red"

This is very simple and straight forward.这是非常简单直接的。 Now, if you have more than one pair, for instance if you have another "red" for the dressID of 12 , then you can pass the result of split() to unique() :现在，如果你有不止一对，例如如果你有另一个“红色”的dressID 12 ，那么你可以将split()的结果传递给unique() ：

result <- lapply(result, unique)

If you have the color as a factor, technically it should also work but it will make every item of the result a factor.如果您将color作为一个因素，从技术上讲它也应该起作用，但它会使result的每个项目成为一个因素。 to mitigate that, simply use unfactor() from varhandle package to convert your factor to a non-factor vector.以减轻，只需使用unfactor()从varhandle包到您系数转换成非要素矢量。

如何将向量分组到向量列表中？

问题描述

5 个解决方案

解决方案1
9 已采纳 2014-02-01 15:13:04

解决方案2
6 2014-02-01 16:13:48

解决方案3
4 2018-08-09 20:51:54

解决方案4
0 2020-06-09 13:04:18

解决方案5
0 2022-01-10 20:14:30

如何将向量分组到向量列表中？

问题描述

5 个解决方案

解决方案1 9 已采纳 2014-02-01 15:13:04

解决方案2 6 2014-02-01 16:13:48

解决方案3 4 2018-08-09 20:51:54

解决方案4 0 2020-06-09 13:04:18

解决方案5 0 2022-01-10 20:14:30

解决方案1
9 已采纳 2014-02-01 15:13:04

解决方案2
6 2014-02-01 16:13:48

解决方案3
4 2018-08-09 20:51:54

解决方案4
0 2020-06-09 13:04:18

解决方案5
0 2022-01-10 20:14:30