简体   繁体   中英

sort() produces different results in Ubuntu and Windows

I have a vector that is being sorted differently when I run the code on my Windows vs. Ubuntu remote server.

Windows:

> u <- getNodes(network)
> head(u)
[1] "-1336623650" "-1749477680" "539"         "-1036241023" "6135"              "-44987577"  
> uid <- sort(u)
> head(uid)
[1] "-1000019199" "-1000022360" "-1000039153" "-1000044219" "-1000069199" "-1000099640"

Ubuntu:

> u <- getNodes(network)
> head(u)
[1] "-1336623650" "-1749477680" "539"         "-1036241023" "6135"
[6] "-44987577"
> uid <- sort(u)
> head(uid)
[1] "10"          "100"         "1000"        "10000"       "-1000019199"
[6] "-1000022360"

Both implementations of R have the same packages loaded and are the same R version (3.3.1). Ubuntu is 13.10 and Windows is Windows 7.

String sorting (which is what you are doing) in R is based on the "locale" which is different for Windows and Linux systems. But, do be careful. No locale will sort these strings in correct numerical order, you would have to sort a vector of numbers if you wanted numerical order.

Grab the value of Sys.getlocale("LC_COLLATE") from each system and compare them. For my package, I do the below at the entry point, and report it in packageStartupMessage.

collateOrigValue<-Sys.getlocale("LC_COLLATE")
on.exit(Sys.setlocale("LC_COLLATE",collateOrigValue), add=TRUE)
Sys.setlocale("LC_COLLATE","C")

See also https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html

使用stringi::stri_sortstringr::str_sort在操作系统stringr::str_sort进行一致的字符串排序。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM