简体   繁体   中英

Importing .csv files with “special” characters

I'm trying to read a .csv file into R. The .csv file was created in Excel, and it contains "long" dashes, which are the result of Excel "auto-correcting" the sequence space-dash-space. Sample entries that contain these "long" dashes:

US – California – LA
US – Washington – Seattle

I've experimented with different encoding, including the following three options:

x <- read.csv(filename, encoding="windows-1252") # Motivated by http://www.perlmonks.org/?node_id=551123
x <- read.csv(filename, encoding="latin1")
x <- read.csv(filename, encoding="UFT-8")

But, the long dashes either show up as (first and second option) or as <U+0096> (third option).

I realize that I can store the file in different formats or use different software ( Excel to CSV with UTF8 encoding ) but that's not the point.

Has anyone figured out what encoding option in R works in such cases?

If you are using RStudio, use Import Dataset.

  • Use Heading: No
  • Separator Whitespace
  • Decimal Period
  • Quote Double quote
  • uncheck strings as factors

when your document is loaded you can simply remove the columns that now show as '?' You can see this is column 2 and column 4. If you have a dataframe, mydf, then you would delete the second column like this.

mydf_new<-mydf[-2]

You could do the same thing for the other column, which is now column 3.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM