[英]How to sort a vector according to a given sequence in R
I'm trying to organize a sequence of data according to a given sequence. 我正在尝试根据给定的序列来组织数据序列。 For example, the given sequence I have is
例如,给定的序列是
set.seed(1)
given_seq <- sample(rep(1:3,2))
The data and its associated sequence 数据及其相关序列
dat_seq <- rep(1:3,2)
dat_value <- rnorm(6)
And I want to organize the data according to the given sequence. 我想根据给定的顺序组织数据。 That is, 1,2,3 serve as a function of labels of data.
也就是说,1,2,3充当数据标签的功能。 For example,
例如,
dat_value
[1] 1.5952808 0.3295078 -0.8204684 0.4874291 0.7383247 0.5757814
dat_seq
[1] 1 2 3 1 2 3
given_seq
[1] 2 3 3 1 1 2
Then I expect the second and fifth data values (with label 2) are placed at first or sixth places. 然后,我希望第二个和第五个数据值(带有标签2)位于第一或第六位。
I can see that the organized sequence is not unique, but I'm not sure how to do this. 我可以看到组织的序列不是唯一的,但是我不确定如何执行此操作。
Here's another option: 这是另一个选择:
dat_value[match(rank(given_seq, ties = "random"), rank(dat_seq, ties = "random"))]
# [1] 0.7383247 0.5757814 -0.8204684 1.5952808 0.4874291 0.3295078
First we convert the two sequences into ones that have no repetitive elements; 首先,我们将两个序列转换为没有重复元素的序列; eg,
例如,
rank(given_seq, ties = "random")
# [1] 3 5 6 1 2 4
That is, if two entries of given_seq
are, say, (1,1), then they will randomly be converted into (1,2) or (2,1). 也就是说,如果给定
given_seq
两个条目为(1,1),则它们将被随机转换为(1,2)或(2,1)。 The same is done with dat_seq
and, consequently, we can match them and reorder dat_value
accordingly. dat_seq
也是一样,因此,我们可以匹配它们并相应地重新排序dat_value
。 Thus, this kind of method would give you some randomization, which may or may not be something desirable in your application. 因此,这种方法将为您提供一些随机性,这在您的应用程序中可能需要也可能不需要。
I would just make the labels unique and use the names
attribute normally. 我只是将标签设为唯一并正常使用
names
属性。
names(dat_value) = make.unique(as.character(dat_seq))
dat_value[make.unique(as.character(given_seq))]
# 2 3 3.1 1 1.1 2.1
# 0.3295078 -0.8204684 0.5757814 1.5952808 0.4874291 0.7383247
You can always strip the names off later if the non-uniqueness doesn't work for your use case. 如果非唯一性不适用于您的用例,您以后可以随时删除名称。
This also works, probably even faster, although it may be harder to understand 这也可能起作用,甚至可能更快,尽管可能很难理解。
dat_value[order(dat_seq)][order(order(given_seq))]
First, we re-order dat_value so that it's corresponding to the sequence c(1,1,2,2,3,3)
. 首先,我们对dat_value进行重新排序,使其对应于序列
c(1,1,2,2,3,3)
。
Then we go for the desired order, which would be given_seq
if that was sequential. 然后我们去希望的顺序,这将是
given_seq
,如果这是连续的。 Fortunately, twice calling order just makes it sequential. 幸运的是,两次调用顺序只会使其顺序化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.