I'm trying to read a .csv file into R. The .csv file was created in Excel, and it contains "long" dashes, which are the result of Excel "auto-correcting" the sequence space-dash-space. Sample entries that contain these "long" dashes:
US – California – LA
US – Washington – Seattle
I've experimented with different encoding, including the following three options:
x <- read.csv(filename, encoding="windows-1252") # Motivated by http://www.perlmonks.org/?node_id=551123
x <- read.csv(filename, encoding="latin1")
x <- read.csv(filename, encoding="UFT-8")
But, the long dashes either show up as (first and second option) or as <U+0096>
(third option).
I realize that I can store the file in different formats or use different software ( Excel to CSV with UTF8 encoding ) but that's not the point.
Has anyone figured out what encoding option in R works in such cases?
If you are using RStudio, use Import Dataset.
when your document is loaded you can simply remove the columns that now show as '?' You can see this is column 2 and column 4. If you have a dataframe, mydf, then you would delete the second column like this.
mydf_new<-mydf[-2]
You could do the same thing for the other column, which is now column 3.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.