简体   繁体   English

如何根据R中的给定序列对向量排序

[英]How to sort a vector according to a given sequence in R

I'm trying to organize a sequence of data according to a given sequence. 我正在尝试根据给定的序列来组织数据序列。 For example, the given sequence I have is 例如,给定的序列是

set.seed(1)
given_seq <- sample(rep(1:3,2))

The data and its associated sequence 数据及其相关序列

dat_seq <- rep(1:3,2)
dat_value <- rnorm(6)

And I want to organize the data according to the given sequence. 我想根据给定的顺序组织数据。 That is, 1,2,3 serve as a function of labels of data. 也就是说,1,2,3充当数据标签的功能。 For example, 例如,

dat_value
[1]  1.5952808  0.3295078 -0.8204684  0.4874291  0.7383247  0.5757814

dat_seq
[1] 1 2 3 1 2 3

given_seq
[1] 2 3 3 1 1 2

Then I expect the second and fifth data values (with label 2) are placed at first or sixth places. 然后,我希望第二个和第五个数据值(带有标签2)位于第一或第六位。

I can see that the organized sequence is not unique, but I'm not sure how to do this. 我可以看到组织的序列不是唯一的,但是我不确定如何执行此操作。

Here's another option: 这是另一个选择:

dat_value[match(rank(given_seq, ties = "random"), rank(dat_seq, ties = "random"))]
# [1]  0.7383247  0.5757814 -0.8204684  1.5952808  0.4874291  0.3295078

First we convert the two sequences into ones that have no repetitive elements; 首先,我们将两个序列转换为没有重复元素的序列; eg, 例如,

rank(given_seq, ties = "random")
# [1] 3 5 6 1 2 4

That is, if two entries of given_seq are, say, (1,1), then they will randomly be converted into (1,2) or (2,1). 也就是说,如果给定given_seq两个条目为(1,1),则它们将被随机转换为(1,2)或(2,1)。 The same is done with dat_seq and, consequently, we can match them and reorder dat_value accordingly. dat_seq也是一样,因此,我们可以匹配它们并相应地重新排序dat_value Thus, this kind of method would give you some randomization, which may or may not be something desirable in your application. 因此,这种方法将为您提供一些随机性,这在您的应用程序中可能需要也可能不需要。

I would just make the labels unique and use the names attribute normally. 我只是将标签设为唯一并正常使用names属性。

names(dat_value) = make.unique(as.character(dat_seq))
dat_value[make.unique(as.character(given_seq))]
 #         2          3        3.1          1        1.1        2.1 
 # 0.3295078 -0.8204684  0.5757814  1.5952808  0.4874291  0.7383247 

You can always strip the names off later if the non-uniqueness doesn't work for your use case. 如果非唯一性不适用于您的用例,您以后可以随时删除名称。

This also works, probably even faster, although it may be harder to understand 这也可能起作用,甚至可能更快,尽管可能很难理解。

dat_value[order(dat_seq)][order(order(given_seq))]

First, we re-order dat_value so that it's corresponding to the sequence c(1,1,2,2,3,3) . 首先,我们对dat_value进行重新排序,使其对应于序列c(1,1,2,2,3,3)
Then we go for the desired order, which would be given_seq if that was sequential. 然后我们去希望的顺序,这将是given_seq ,如果这是连续的。 Fortunately, twice calling order just makes it sequential. 幸运的是,两次调用顺序只会使其顺序化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM