[英]handling special characters e.g. accents in R
I am doing some web scraping of names into a dataframe 我正在将一些名称的网页抓取到数据帧中
For a name such as "Tomáš Rosický, I get a result "Tomáš Rosický" 对于像“TomášRosický这样的名字,我得到一个结果”Tomá¡¡Rosický
I tried 我试过了
Encoding("Tomáš Rosický") # with latin1 response
but was not sure where to go from there to get the original name with accents back. 但不知道从那里去哪里获得带有重音符号的原始名称。 Played around with iconv without success 玩了iconv没有成功
I would be satisfied (and might even prefer) an output of "Tomas Rosicky" 我会满意(甚至可能更喜欢)“Tomas Rosicky”的输出
You've read in a page encoded in UTF-8. 您已阅读以UTF-8编码的页面。 if x
is your column of names, use Encoding(x) <- "UTF-8"
. 如果x
是您的名称列,请使用Encoding(x) <- "UTF-8"
。
一种正确导出重音的方法:
enc2utf8(as(dataframe$columnname, "character"))
To do a correct read of the file use the scan function: 要正确读取文件,请使用扫描功能:
namb <- scan(file='g:/testcodering.txt', fileEncoding='UTF-8',
what=character(), sep='\n', allowEscapes=T)
cat(namb)
This also works: 这也有效:
namc <- readLines(con <- file('g:/testcodering.txt', "r",
encoding='UTF-8')); close(con)
cat(namc)
This will read the file with the correct accents 这将使用正确的重音读取文件
你应该用这个:
df$colname <- iconv(df$colname, from="UTF-8", to="LATIN1")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.