处理特殊字符，例如R中的重音符号

Question

I am doing some web scraping of names into a dataframe 我正在将一些名称的网页抓取到数据帧中

For a name such as "Tomáš Rosický, I get a result "TomÃ¡Å¡ RosickÃ½" 对于像“TomášRosický这样的名字，我得到一个结果”Tomá¡¡RosickÃ½

I tried 我试过了

Encoding("TomÃ¡Å¡ RosickÃ½") #  with latin1 response

but was not sure where to go from there to get the original name with accents back. 但不知道从那里去哪里获得带有重音符号的原始名称。 Played around with iconv without success 玩了iconv没有成功

I would be satisfied (and might even prefer) an output of "Tomas Rosicky" 我会满意（甚至可能更喜欢）“Tomas Rosicky”的输出

Answer 1

You've read in a page encoded in UTF-8. 您已阅读以UTF-8编码的页面。 if x is your column of names, use Encoding(x) <- "UTF-8" . 如果x是您的名称列，请使用Encoding(x) <- "UTF-8" 。

Answer 2

一种正确导出重音的方法：

enc2utf8(as(dataframe$columnname, "character"))

Answer 3

To do a correct read of the file use the scan function: 要正确读取文件，请使用扫描功能：

namb <- scan(file='g:/testcodering.txt', fileEncoding='UTF-8',
what=character(), sep='\n', allowEscapes=T)
cat(namb)

This also works: 这也有效：

namc <- readLines(con <- file('g:/testcodering.txt', "r",
encoding='UTF-8')); close(con)
cat(namc)

This will read the file with the correct accents 这将使用正确的重音读取文件

Answer 4

你应该用这个：

df$colname <- iconv(df$colname, from="UTF-8", to="LATIN1")

处理特殊字符，例如R中的重音符号

问题描述

4 个解决方案

解决方案1
10 已采纳 2012-03-01 06:28:58

解决方案2
3 2013-11-20 15:40:04

解决方案3
3 2012-03-01 11:08:55

解决方案4
2 2015-07-13 17:25:12

处理特殊字符，例如R中的重音符号

问题描述

4 个解决方案

解决方案1 10 已采纳 2012-03-01 06:28:58

解决方案2 3 2013-11-20 15:40:04

解决方案3 3 2012-03-01 11:08:55

解决方案4 2 2015-07-13 17:25:12

解决方案1
10 已采纳 2012-03-01 06:28:58

解决方案2
3 2013-11-20 15:40:04

解决方案3
3 2012-03-01 11:08:55

解决方案4
2 2015-07-13 17:25:12