简体   繁体   English

R中data.frame中sort()如何工作

[英]How does sort() work in data.frame within R

How does sort work, that is using what method to sort column in sort工作方式,即使用哪种方法对列进行排序

data.frame (barley$site, barley$year, barley$variety)

as following 如下

library(lattice)
barley <- barley[order(barley$site, barley$year, barley$variety), ] 

You probably want: 您可能想要:

barley[order(as.character(barley$site), as.numeric(barley$year), as.character(barley$variety)),] 

As you have it you are ordering by the underlying levels of the data.frame, which leads to really odd stuff. 有了它,您就可以按data.frame的基础级别进行排序,这会导致真正的奇怪事情。 Look at the structure of the data frame: 看一下数据框的结构:

 'data.frame':  120 obs. of  4 variables:
 $ yield  : num  27 48.9 27.4 39.9 33 ...
 $ variety: Factor w/ 10 levels "Svansota","No. 462",..: 3 3 3 3 3 3 7 7 7 7 ...
 $ year   : Factor w/ 2 levels "1932","1931": 2 2 2 2 2 2 2 2 2 2 ...
 $ site   : Factor w/ 6 levels "Grand Rapids",..: 3 6 4 5 1 2 3 6 4 5 ...

Notice how the levels for year are in the opposite order you would expect. 注意year的水平与您期望的相反。 The documentation for order discusses this very briefly: order文档对此进行了简要讨论:

For factors, this sorts on the internal codes, which is particularly appropriate for ordered factors. 对于因子,这是根据内部代码排序的,这特别适用于有序因子。

I personally think this terribly confusing, but it is what it is. 我个人认为这非常令人困惑,但这就是事实。 factor are very useful in most contexts, but incredibly dangerous in others if you're not careful. 因素在大多数情况下都非常有用,但是如果您不小心的话,在其他情况下则非常危险。 Having numbers represented as factors (as year was here) is particularly bad. 用数字表示因素(如此处的year )特别糟糕。

See ?factor for more details. 有关更多详细信息,请参见?factor

By default, sort doesn't know how to do anything with a data frame. 默认情况下, sort不知道如何对数据框执行任何操作。 You can sort the individual columns within a data frame, with something like df$x <- sort(df$x) but you almost certainly don't want to do that; 您可以使用df$x <- sort(df$x)之类的方式对数据框中的各个列进行df$x <- sort(df$x)但是几乎可以肯定,您不会这样做。 it will just mess up your data. 它只会弄乱您的数据。

You order the rows in the data frame by using order as in the example code you have there. 您可以通过使用order来对数据框中的行进行order就像那里的示例代码一样。 This orders the rows by values in the column site , breaking ties with year , and then with variety . 这将按列site的值对行进行排序,先中断与year联系,然后再更改其variety

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM