简体   繁体   中英

Natural sorting with R differs on deployment (maybe OS/Locale issue)

I am using the package "naturalsort" found here: https://github.com/kos59125/naturalsort Natural sorting is not something that is implemented elsewhere in a good manner in R as far as I know, so I was happy to find this package.

I use the function naturalsort to sort file names just like windows explorer, which works great locally.

But when I use it in my production environment deployed with Docker on Google Cloud Run, the sorting changes. I don't know if this is due to changes in locale(I am fra Denmark) or it is due to OS differences between my windows PC and the Docker/Google Cloud Run deployment.

I have created a example ready to be run in R:

######## Code start ###########
require(plumber)
require(naturalsort) #for name sorting

#* Retrieve sorted string list
#* @get /sortstrings
#* @param nothing
function(nothing) {
  
  print(nothing)
  
  test <- c("0.jpg", "file (4_5_1).jpeg", "1 tall thin image.jpeg",
            "8.jpeg", "8.jpg", "file (2.1.2).jpeg", "file (0).jpeg", "3.jpeg",
            "file (1).jpeg", "file (2.1.1).jpeg", "file (0) (3).jpeg", "file (2).jpeg",
            "file (2.1).jpeg", "file (4_5).jpeg", "file (4).jpeg", "file (39).jpeg")
  
  print("Direct sort")
  print(naturalsort(text = test))
  
  sorted_strings <- naturalsort(text = test)
  
  return(sorted_strings) 
}
######## Code end ###########

I would expect it to sort the file names like below, which it does locally both when run directly in the script and also when doing it through plumber RUN API:

    c("0.jpg", 
  "1 tall thin image.jpeg", 
  "3.jpeg", 
  "8.jpeg", 
  "8.jpg", 
  "file (0) (3).jpeg", 
  "file (0).jpeg", 
  "file (1).jpeg", 
  "file (2).jpeg", 
  "file (2.1).jpeg", 
  "file (2.1.1).jpeg", 
  "file (2.1.2).jpeg", 
  "file (4).jpeg", 
  "file (4_5).jpeg", 
  "file (4_5_1).jpeg", 
  "file (39).jpeg"
  )

But instead it sorts it like this:

c("0.jpg",
"1 tall thin image.jpeg",
"3.jpeg",
"8.jpeg",
"8.jpg",
"file (0) (3).jpeg",
"file (0).jpeg",
"file (1).jpeg",
"file (2.1.1).jpeg",
"file (2.1.2).jpeg",
"file (2.1).jpeg",
"file (2).jpeg",
"file (4_5_1).jpeg",
"file (4_5).jpeg",
"file (4).jpeg",
"file (39).jpeg")

Which is not like windows explorer.

Try fixing the collating sequence prior to the naturalsort call. It varies by locale and can affect how strings are compared (and therefore sorted).

## Get initial value
lcc <- Sys.getlocale("LC_COLLATE")

## Use fixed value
Sys.setlocale("LC_COLLATE", "C")

sorted_strings <- naturalsort(text = test)

## Restore initial value
Sys.setlocale("LC_COLLATE", lcc)

You can find some details in ?sort , ?Comparison , and ?locales and more here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM