I want to use data from the data set here . It is from a data set in Spanish, from Peru I think. It can be downloaded in several formats but they all seem to have the same problem. Here's an example of the problem - maÌ_z
. This should be maíz
. My first thought was that there a font encoding problem. But I have tried several font encoding choices that are sometimes used for Spanish language documents (eg, UTF-8, WINDOWS-1252, ISO-8859-1) using the RStudio Reopen with Encoding
option. The character representation changes for some of them but not to the appropriate í
. Some other examples Cimarr?_n
, c??scara
, m??shka
. I think I can do a search and replace but would prefer to find an encoding fix.
Have you try to use directly the encoding
argument in the read()
function? Here is an example :
dt <- read.csv("dt", header = TRUE, sep = ",", dec = ".",
comment.char = "", strip.white = TRUE,
stringsAsFactors = TRUE, encoding="UTF-8")
When I use french data I have to do it this way.
It is possible the orignal file was not encoded in UTF-8, so you may have too encode it before reading it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.