简体   繁体   English

处理特殊字符,例如R中的重音符号

[英]handling special characters e.g. accents in R

I am doing some web scraping of names into a dataframe 我正在将一些名称的网页抓取到数据帧中

For a name such as "Tomáš Rosický, I get a result "Tomáš Rosický" 对于像“TomášRosický这样的名字,我得到一个结果”Tomá¡¡Rosický

I tried 我试过了

Encoding("Tomáš Rosický") #  with latin1 response

but was not sure where to go from there to get the original name with accents back. 但不知道从那里去哪里获得带有重音符号的原始名称。 Played around with iconv without success 玩了iconv没有成功

I would be satisfied (and might even prefer) an output of "Tomas Rosicky" 我会满意(甚至可能更喜欢)“Tomas Rosicky”的输出

You've read in a page encoded in UTF-8. 您已阅读以UTF-8编码的页面。 if x is your column of names, use Encoding(x) <- "UTF-8" . 如果x是您的名称列,请使用Encoding(x) <- "UTF-8"

一种正确导出重音的方法:

enc2utf8(as(dataframe$columnname, "character"))

To do a correct read of the file use the scan function: 要正确读取文件,请使用扫描功能:

namb <- scan(file='g:/testcodering.txt', fileEncoding='UTF-8',
what=character(), sep='\n', allowEscapes=T)
cat(namb)

This also works: 这也有效:

namc <- readLines(con <- file('g:/testcodering.txt', "r",
encoding='UTF-8')); close(con)
cat(namc)

This will read the file with the correct accents 这将使用正确的重音读取文件

你应该用这个:

df$colname <- iconv(df$colname, from="UTF-8", to="LATIN1")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Knitr在R代码中转义乳胶特殊字符(例如〜,$) - Knitr escape latex special characters (e.g., ~, $) in R code R:计算包含特殊字符的术语数量(例如,数据集中的 [url] - R: Count the number of terms that include special characters (e.g. [url] in a dataset stringi R 忽略要匹配的重音特殊字符 - stringi R ignore accents special characters to match 如何使R正则表达式捕获特殊字符(例如点(。)和下划线(_))? - How to make R regex capture special character (e.g. dot (.) and underscore ( _ ) )? 解析 R 中的文本(例如,“[1,2,3]”) - Parsing text in R (e.g., "[1,2,3]") 如何正确处理R中的转义Unicode字符,例如em破折号( - ) - How to correctly deal with escaped Unicode Characters in R e.g. the em dash (—) 显示 R 中某个类(例如 lm)的所有函数(例如,打印、摘要) - Show all functions (e.g., print, summary) for a class (e.g., lm) in R 如何删除一组特定字符之前的所有内容(例如,“? - ”)? - How to remove everything before a set of certain characters (e.g., "? - ")? R 合并数据帧,允许不精确的 ID 匹配(例如,附加字符 1234 匹配 ab1234) - R merge data frames, allow inexact ID matching (e.g. with additional characters 1234 matches ab1234 ) 为什么取幂(例如,10 ^ 6)比R中的计算器符号(例如,1e6)长4倍? - Why does exponentiation (e.g., 10^6) take 4 times longer than calculator notation (e.g., 1e6) in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM