导入带有“特殊”字符的.csv文件

Question

I'm trying to read a .csv file into R. The .csv file was created in Excel, and it contains "long" dashes, which are the result of Excel "auto-correcting" the sequence space-dash-space. 我正在尝试将.csv文件读入R。该.csv文件是在Excel中创建的，它包含“长”破折号，这是Excel“自动更正”序列空间-破折号-空格的结果。 Sample entries that contain these "long" dashes: 包含这些“长”破折号的示例条目：

US – California – LA 美国–加利福尼亚–洛杉矶
US – Washington – Seattle 美国–华盛顿–西雅图

I've experimented with different encoding, including the following three options: 我尝试了不同的编码，包括以下三个选项：

x <- read.csv(filename, encoding="windows-1252") # Motivated by http://www.perlmonks.org/?node_id=551123
x <- read.csv(filename, encoding="latin1")
x <- read.csv(filename, encoding="UFT-8")

But, the long dashes either show up as (first and second option) or as <U+0096> (third option). 但是，长破折号显示为（第一个和第二个选项）或<U+0096> （第三个选项）。

I realize that I can store the file in different formats or use different software ( Excel to CSV with UTF8 encoding ) but that's not the point. 我意识到我可以用不同的格式存储文件或使用不同的软件（使用UTF8编码的Excel到CSV ），但这不是重点。

Has anyone figured out what encoding option in R works in such cases? 有谁知道在这种情况下R中的哪种编码选项有效？

Answer 1

If you are using RStudio, use Import Dataset. 如果您使用的是RStudio，请使用“导入数据集”。

Use Heading: No 使用标题：否
Separator Whitespace 分隔符空白
Decimal Period 小数点
Quote Double quote 报价双引号
uncheck strings as factors 取消选中字符串作为因素

when your document is loaded you can simply remove the columns that now show as '?' 加载文档后，您只需删除现在显示为“？”的列即可 You can see this is column 2 and column 4. If you have a dataframe, mydf, then you would delete the second column like this. 您可以看到这是第2列和第4列。如果您有数据框mydf，则可以像这样删除第二列。

mydf_new<-mydf[-2]

You could do the same thing for the other column, which is now column 3. 您可以对另一列（即现在的第3列）执行相同的操作。

导入带有“特殊”字符的.csv文件

问题描述

1 个解决方案

解决方案1
0 2015-10-21 17:15:01

导入带有“特殊”字符的.csv文件

问题描述

1 个解决方案

解决方案1 0 2015-10-21 17:15:01

解决方案1
0 2015-10-21 17:15:01