sort() produces different results in Ubuntu and Windows

Question

I have a vector that is being sorted differently when I run the code on my Windows vs. Ubuntu remote server.

Windows:

> u <- getNodes(network)
> head(u)
[1] "-1336623650" "-1749477680" "539"         "-1036241023" "6135"              "-44987577"  
> uid <- sort(u)
> head(uid)
[1] "-1000019199" "-1000022360" "-1000039153" "-1000044219" "-1000069199" "-1000099640"

Ubuntu:

> u <- getNodes(network)
> head(u)
[1] "-1336623650" "-1749477680" "539"         "-1036241023" "6135"
[6] "-44987577"
> uid <- sort(u)
> head(uid)
[1] "10"          "100"         "1000"        "10000"       "-1000019199"
[6] "-1000022360"

Both implementations of R have the same packages loaded and are the same R version (3.3.1). Ubuntu is 13.10 and Windows is Windows 7.

Answer 1

String sorting (which is what you are doing) in R is based on the "locale" which is different for Windows and Linux systems. But, do be careful. No locale will sort these strings in correct numerical order, you would have to sort a vector of numbers if you wanted numerical order.

Grab the value of Sys.getlocale("LC_COLLATE") from each system and compare them. For my package, I do the below at the entry point, and report it in packageStartupMessage.

collateOrigValue<-Sys.getlocale("LC_COLLATE")
on.exit(Sys.setlocale("LC_COLLATE",collateOrigValue), add=TRUE)
Sys.setlocale("LC_COLLATE","C")

See also https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html

Answer 2

使用stringi::stri_sort或stringr::str_sort在操作系统stringr::str_sort进行一致的字符串排序。

sort() produces different results in Ubuntu and Windows

Question

2 answers

solution1
8 ACCPTED 2016-08-26 18:32:56

solution2
5 2016-12-23 01:35:47

sort() produces different results in Ubuntu and Windows

Question

2 answers

solution1 8 ACCPTED 2016-08-26 18:32:56

solution2 5 2016-12-23 01:35:47

solution1
8 ACCPTED 2016-08-26 18:32:56

solution2
5 2016-12-23 01:35:47