Importing .csv files with “special” characters

Question

I'm trying to read a .csv file into R. The .csv file was created in Excel, and it contains "long" dashes, which are the result of Excel "auto-correcting" the sequence space-dash-space. Sample entries that contain these "long" dashes:

US – California – LA
US – Washington – Seattle

I've experimented with different encoding, including the following three options:

x <- read.csv(filename, encoding="windows-1252") # Motivated by http://www.perlmonks.org/?node_id=551123
x <- read.csv(filename, encoding="latin1")
x <- read.csv(filename, encoding="UFT-8")

But, the long dashes either show up as (first and second option) or as <U+0096> (third option).

I realize that I can store the file in different formats or use different software ( Excel to CSV with UTF8 encoding ) but that's not the point.

Has anyone figured out what encoding option in R works in such cases?

Answer 1

If you are using RStudio, use Import Dataset.

Use Heading: No
Separator Whitespace
Decimal Period
Quote Double quote
uncheck strings as factors

when your document is loaded you can simply remove the columns that now show as '?' You can see this is column 2 and column 4. If you have a dataframe, mydf, then you would delete the second column like this.

mydf_new<-mydf[-2]

You could do the same thing for the other column, which is now column 3.

Importing .csv files with “special” characters

Question

1 answers

solution1
0 2015-10-21 17:15:01

Importing .csv files with “special” characters

Question

1 answers

solution1 0 2015-10-21 17:15:01

solution1
0 2015-10-21 17:15:01