[英]How to group a vector into a list of vectors?
I have some data which looks like this (fake data for example's sake):我有一些看起来像这样的数据(例如假数据):
dressId color
6 yellow
9 red
10 green
10 purple
10 yellow
12 purple
12 red
where color is a factor vector.其中颜色是因子向量。 It is not guaranteed that all possible levels of the factor actually appear in the data (eg the color "blue" could also be one of the levels).
不能保证该因子的所有可能级别都实际出现在数据中(例如,颜色“蓝色”也可能是级别之一)。
I need a list of vectors which groups the available colors of each dress:我需要一个向量列表,将每件衣服的可用颜色分组:
[[1]]
yellow
[[2]]
red
[[3]]
green purple yellow
[[4]]
purple red
Preserving the IDs of the dresses would be nice (eg a dataframe where this list is the second column and the IDs are the first), but not necessary.保留连衣裙的 ID 会很好(例如,一个数据框,该列表是第二列,ID 是第一列),但不是必需的。
I wrote a loop which goes through the dataframe row for row, and while the next ID is the same, it adds the color to a vector.我写了一个循环遍历数据帧行,虽然下一个 ID 相同,但它将颜色添加到向量中。 (I am sure that the data is sorted by ID).
(我确信数据是按 ID 排序的)。 When the ID in the first column changes, it adds the vector to a list:
当第一列中的 ID 发生变化时,它会将向量添加到列表中:
result <- NULL
while(blah blah)
{
some code which creates the vector called "colors"
result[[dressCounter]] <- colors
dressCounter <- dressCounter + 1
}
After wrestling with getting all the necessary counting variables correct, I found out to my dismay that it doesn't work.在努力使所有必要的计数变量都正确之后,我沮丧地发现它不起作用。 The first time,
colors
is第一次,
colors
是
[1] yellow
Levels: green yellow purple red blue
and it gets coerced into an integer, so result
becomes 2
.它被强制转换为整数,因此
result
变为2
。
In the second loop repetition, colors
only contains red, and result
becomes a simple integer vector, [1] 2 4
.在第二次循环重复中,
colors
只包含红色, result
变成了一个简单的整数向量[1] 2 4
。
In the third repetition, colors
is a vector now,在第三次重复中,
colors
现在是一个向量,
[1] green purple yellow
Levels: green yellow purple red blue
and I get我得到
result[[3]] <- colors
Error in result[[3]] <- colors :
结果错误[[3]] <-颜色:
more elements supplied than there are to replace提供的元素多于替换的元素
What am I doing wrong?我究竟做错了什么? Is there a way to initialize
result
so it doesn't get converted into a numeric vector, but becomes a list of vectors?有没有办法初始化
result
所以它不会被转换为数字向量,而是成为向量列表?
Also, is there another way to do the whole thing than "roll my own"?此外,除了“自己动手”之外,还有其他方法可以完成整个事情吗?
split.data.frame
is a good way to organize this; split.data.frame
是一个很好的组织方式; then extract the color component.然后提取颜色分量。
d <- data.frame(dressId=c(6,9,10,10,10,12,12),
color=factor(c("yellow","red","green",
"purple","yellow",
"purple","red"),
levels=c("red","orange","yellow",
"green","blue","purple")))
I think the version you want is actually this:我认为你想要的版本实际上是这样的:
ss <- split.data.frame(d,d$dressId)
You can get something more like the list you requested by extracting the color component:通过提取颜色分量,您可以获得更像您请求的列表的内容:
lapply(ss,"[[","color")
In addition to split
, you should consider aggregate
.除了
split
,您还应该考虑aggregate
。 Use c
or I
as the aggregation function to get your list
column:使用
c
或I
作为聚合函数来获取您的list
列:
out <- aggregate(color ~ dressId, mydf, c)
out
# dressId color
# 1 6 yellow
# 2 9 red
# 3 10 green, purple, yellow
# 4 12 purple, red
str(out)
# 'data.frame': 4 obs. of 2 variables:
# $ dressId: int 6 9 10 12
# $ color :List of 4
# ..$ 0: chr "yellow"
# ..$ 1: chr "red"
# ..$ 2: chr "green" "purple" "yellow"
# ..$ 3: chr "purple" "red"
out$color
# $`0`
# [1] "yellow"
#
# $`1`
# [1] "red"
#
# $`2`
# [1] "green" "purple" "yellow"
#
# $`3`
# [1] "purple" "red"
Note : This works even if the "color" variable is a factor
, as in Ben's sample data (I missed that point when I posted the answer above) but you need to use I
as the aggregation function instead of c
:注意:即使“颜色”变量是一个
factor
,这也有效,如 Ben 的示例数据(我在上面发布答案时错过了那个点)但您需要使用I
作为聚合函数而不是c
:
out <- aggregate(color ~ dressId, d, I)
str(out)
# 'data.frame': 4 obs. of 2 variables:
# $ dressId: num 6 9 10 12
# $ color :List of 4
# ..$ 0: Factor w/ 6 levels "red","orange",..: 3
# ..$ 1: Factor w/ 6 levels "red","orange",..: 1
# ..$ 2: Factor w/ 6 levels "red","orange",..: 4 6 3
# ..$ 3: Factor w/ 6 levels "red","orange",..: 6 1
out$color
# $`0`
# [1] yellow
# Levels: red orange yellow green blue purple
#
# $`1`
# [1] red
# Levels: red orange yellow green blue purple
#
# $`2`
# [1] green purple yellow
# Levels: red orange yellow green blue purple
#
# $`3`
# [1] purple red
# Levels: red orange yellow green blue purple
Strangely, however, the default display shows the integer values:然而,奇怪的是,默认显示显示的是整数值:
out
# dressId color
# 1 6 3
# 2 9 1
# 3 10 4, 6, 3
# 4 12 6, 1
Assuming your data frame is saved in a variable called df
, then you can use simply group_by
and summarize
with list
function of dplyr
package like this假设您的数据框保存在一个名为
df
的变量中,那么您可以简单地使用group_by
并使用dplyr
包的list
函数进行summarize
,如下所示
library('dplyr')
df %>%
group_by(dressId) %>%
summarize(colors = list(color))
Applied to your example:应用于您的示例:
df <- tribble(
~dressId, ~color,
6, 'yellow',
9, 'red',
10, 'green',
10, 'purple',
10, 'yellow',
12, 'purple',
12, 'red'
)
df %>%
group_by(dressId) %>%
summarize(colors = list(color))
# dressId colors
# 6 yellow
# 9 red
# 10 green, purple, yellow
# 12 purple, red
I am afraid that the answer should be a little different, you should use the following code to accomplish the requested behaviour恐怕答案应该有点不同,您应该使用以下代码来完成请求的行为
df %>%
group_by(dressId) %>%
summarize(colors = toString(unique(color)))
All the other answers do the job and I'm slightly late to the party, but some have used dplyr, and I always try to stay away from tidyverse if possible, and for this problem one can use the base R without tidyverse bloat.所有其他答案都可以完成这项工作,我参加聚会有点晚了,但有些人使用了 dplyr,如果可能的话,我总是尽量远离 tidyverse,对于这个问题,可以使用基本 R 而不会使 tidyverse 膨胀。 Some others have solved this through making a dataframe and that is not what the title says :)
其他一些人通过制作数据框解决了这个问题,这不是标题所说的:)
let's create the vectors as OP didn't provide us the code (note that OP wants vector and not a dataframe although you can do this with a dataframe with a minor modification):让我们创建向量,因为 OP 没有向我们提供代码(请注意,OP 需要向量而不是数据帧,尽管您可以使用稍作修改的数据帧来执行此操作):
dressId <- c(6, 9, 10, 10, 10, 12, 12)
color <- c("yellow", "red", "green", "purple", "yellow", "purple", "red")
Now let's get to the business and calculate what OP asked for:现在让我们开始业务并计算 OP 的要求:
I need a list of vectors which groups the available colors of each dress:
我需要一个向量列表,将每件衣服的可用颜色分组:
result <- split(x = color, f = dressId)
result
which will output:这将输出:
$`6` [1] "yellow" $`9` [1] "red" $`10` [1] "green" "purple" "yellow" $`12` [1] "purple" "red"
This is very simple and straight forward.这是非常简单直接的。 Now, if you have more than one pair, for instance if you have another "red" for the dressID of
12
, then you can pass the result of split()
to unique()
:现在,如果你有不止一对,例如如果你有另一个“红色”的dressID
12
,那么你可以将split()
的结果传递给unique()
:
result <- lapply(result, unique)
If you have the color
as a factor, technically it should also work but it will make every item of the result
a factor.如果您将
color
作为一个因素,从技术上讲它也应该起作用,但它会使result
的每个项目成为一个因素。 to mitigate that, simply use unfactor()
from varhandle
package to convert your factor to a non-factor vector.以减轻,只需使用
unfactor()
从varhandle
包到您系数转换成非要素矢量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.