简体   繁体   English

Windows和Mac上的UTF-8编码使字符混乱

[英]UTF-8 encoding on Windows and Mac messes characters

I want to use my Windows (7 64bit) machine to obtain data from the Itunes API and process this data on my Mac (64bit El Capitan). 我想使用Windows(7 64位)计算机从Itunes API获取数据并在Mac(64位El Capitan)上处理该数据。 I am using the RJSONIO package to extract the names of the applications, they are from different countries in different languages. 我正在使用RJSONIO包提取应用程序的名称,它们来自不同国家的不同语言。 I attached a minimal examples with a few applications only. 我仅附带了一些仅包含少数应用程序的示例。 My preferred encoding is UTF-8. 我首选的编码是UTF-8。

library(RJSONIO)

getall<-function(ID){
u<-ID
lapply(X = u, function(u){
    dat <- fromJSON(u, encoding = "UTF-8")
    Name<-try(dat$results[[1]]$trackName)
    Artistname<-try(dat$results[[1]]$artistName)
    Seller<-try(dat$results[[1]]$sellerName)
    results<-return(list(Name, Artistname,Seller))
    })
}

apps1<-c("https://itunes.apple.com/lookup?id=335549244", "https://itunes.apple.com/lookup?id=362032276", "https://itunes.apple.com/lookup?id=353410020", "https://itunes.apple.com/lookup?id=350146139","https://itunes.apple.com/lookup?id=358942449", "https://itunes.apple.com/lookup?id=359871187")
    system.time(itunesNew<-data.frame(matrix(unlist(getall(ID = apps1), use.names = FALSE), nrow = length(apps1), ncol = 3, byrow = TRUE),stringsAsFactors=FALSE, byrow=T))
    colnames(itunesNew)<-c("Name", "Artistname","Seller")
    itunesnew2<-cbind(apps1, itunesNew)

I am using R with R Studio (both the most recent versions) and set standard encoding to UTF-8 in the global options. 我将R和R Studio(均为最新版本)一起使用,并在全局选项中将标准编码设置为UTF-8。 I was not able to set my locale to UTF-8 using 我无法使用以下方式将语言环境设置为UTF-8:

Sys.setlocale("LC_MESSAGES", 'en_GB.UTF-8')

or other versions in R. I also tried to download the data in "latin1" (it looks alright then on the PC), but messed up on the mac (setting encoding to latin1 in R Studio.). 或R中的其他版本。我还尝试在“ latin1”中下载数据(在PC上看起来还不错),但在Mac上却搞砸了(在R Studio中将编码设置为latin1)。

Questions : 问题

  1. Is there a way to work with the data on both machines using UTF-8? 有没有办法使用UTF-8在两台计算机上处​​理数据?
  2. Are there other options to work on both machines? 在这两台机器上还有其他选择可以使用吗?
  3. More general: is UTF-8 the encoding one should prefer for data like this? 更笼统:对于这样的数据,UTF-8是否应该首选编码?

I don't have my Windows VM handy but try this (it uses jsonlite & dplyr on both your systems to see if it helps (I ran it on OS X): 我没有Windows VM,但可以尝试一下(它在两个系统上都使用jsonlitedplyr来查看是否有帮助(我在OS X上运行了它):

library(jsonlite)
library(dplyr)

"%||%" <- function(a, b) { if (!is.null(a)) a else b }

apps <- c("https://itunes.apple.com/lookup?id=335549244", 
          "https://itunes.apple.com/lookup?id=362032276", 
          "https://itunes.apple.com/lookup?id=353410020", 
          "https://itunes.apple.com/lookup?id=350146139",
          "https://itunes.apple.com/lookup?id=358942449", 
          "https://itunes.apple.com/lookup?id=359871187")

bind_rows(lapply(apps, function(x) {
  res <- jsonlite::fromJSON(x, flatten=TRUE)$results
  data_frame(name=res$trackName %||% NA,
             artist_name=res$sellerName %||% NA,
             seller=res$sellerName %||% NA)
})) -> dat

glimpse(dat)

## Observations: 6
## Variables: 3
## $ name        (chr) "A+ the Waverley Novels Collection (15Books)", "A+ 中國養生寶典[卷一]", "...
## $ artist_name (chr) "rice mi", "CHEUNG PUI MAN", "CHEUNG PUI MAN", "CHEUNG PUI MAN", ...
## $ seller      (chr) "rice mi", "CHEUNG PUI MAN", "CHEUNG PUI MAN", "CHEUNG PUI MAN", ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM