简体   繁体   中英

Error while reading csv file in R

I am having some problems in reading a csv file with R.

 x=read.csv("LorenzoFerrone.csv",header=T)

Error in make.names(col.names, unique = TRUE) : 
      invalid multibyte string at '<ff><fe>N'

I can read the file using libre office with no problems.

I can not upload the file because it is full of sensible information.

What can I do?


Setting encoding seem like the solution to the problem.

> x=read.csv("LorenzoFerrone.csv",fileEncoding = "UCS-2LE")
> x[2,1]
[1] Adriano Caruso
100 Levels:  Ada Adriano Caruso adriano diaz Adriano Diaz alberto ferrone Alexey ... Zia Tina

The cause is an invalid encoding. I have solved replacing all the "è" with e

This will read the column names as-is and won't return any errors:

x = read.csv(check.names = F)

To remove/replace troublesome characters in column names, use this:

iconv(names(x), to = "ASCII", sub = "")

我发现这个问题是由文件代码引起的,我解决了这个问题,用Windows note打开它,用UTF-8保存,然后用Excel重新打开(一开始是乱码),然后用UTF-8重新保存,然后就可以了!

您需要在sep参数中指定正确的分隔符。

You can always use the "Latin1" encoding while reading the csv:

 x = read.csv("LorenzoFerrone.csv", fileEncoding = "Latin1", check.names = F)

I am adding check.names = F to avoid replacing spaces by dots within your header.

Typically an encoding issue. You can try to change encoding or else deleting the offending character (just use your favorite editor and replace all instances). In some cases R will spit the char location, for example:

invalid multibyte string 1847

Which should make your life easier. Also note that you may be required to repeat this process several times (deleting all offending characters or trying several encodings).

Change the file format to - CSV UTF-8. It worked for me.

不确定这是否有帮助,但我遇到了类似的问题,并发现这是因为我的“csv”文件有一个 .csv 后缀,但实际上是一个 .xls 文件!

不确定这是否有帮助,只是有一个类似的问题,我通过从我尝试导入的 csv 中删除“来解决这个问题。数据库的第一行将列名写为“colname”、“colname2”、“etc”和我删除了所有 " 然后在 R 中读取了 csv 就好了。

I solved the problem by removing any graphical signs in the writing (ie accent marks). My headers were written in Spanish and had some accent marks in there. I replaced with simple words (México=Mexico) and problem was solved.

我知道这是一个旧帖子,但只是想对非英语本地人说,如果您使用“,”作为十进制分隔符,

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM