简体   繁体   English

R read.csv“列多于列名”错误

[英]R read.csv "More columns than column names" error

I have a problem when importing .csv file into R. With my code:.csv文件导入 R 时出现问题。使用我的代码:

t <- read.csv("C:\\N0_07312014.CSV", na.string=c("","null","NaN","X"),
          header=T, stringsAsFactors=FALSE,check.names=F)

R reports an error and does not do what I want: R 报错,没有做我想做的事:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names

I guess the problem is because my data is not well formatted.我想问题是因为我的数据格式不正确。 I only need data from [,1:32] .我只需要来自[,1:32]的数据。 All others should be deleted.所有其他的都应该删除。

Data can be downloaded from: https://drive.google.com/file/d/0B86_a8ltyoL3VXJYM3NVdmNPMUU/edit?usp=sharing数据可从以下网址下载: https ://drive.google.com/file/d/0B86_a8ltyoL3VXJYM3NVdmNPMUU/edit?usp=sharing

Thanks so much!非常感谢!

Open the .csv as a text file (for example, use TextEdit on a Mac) and check to see if columns are being separated with commas.打开 .csv 作为文本文件(例如,在 Mac 上使用 TextEdit)并检查列是否用逗号分隔。

csv is "comma separated vectors". csv 是“逗号分隔的向量”。 For some reason when Excel saves my csv's it uses semicolons instead.出于某种原因,当 Excel 保存我的 csv 时,它使用分号代替。

When opening your csv use:打开 csv 时使用:

read.csv("file_name.csv",sep=";")

Semi colon is just an example but as someone else previously suggested don't assume that because your csv looks good in Excel that it's so.分号只是一个例子,但正如之前其他人所建议的那样,不要假设因为你的 csv 在 Excel 中看起来不错,所以它就是这样。

That's one wonky CSV file.那是一个不稳定的 CSV 文件。 Multiple headers tossed about (try pasting it to CSV Fingerprint ) to see what I mean.折腾了多个标题(尝试将其粘贴到CSV Fingerprint )以了解我的意思。

Since I don't know the data, it's impossible to be sure the following produces accurate results for you, but it involves using readLines and other R functions to pre-process the text:由于我不知道数据,因此无法确保以下内容为您生成准确的结果,但它涉及使用readLines和其他 R 函数来预处理文本:

# use readLines to get the data
dat <- readLines("N0_07312014.CSV")

# i had to do this to fix grep errors
Sys.setlocale('LC_ALL','C')

# filter out the repeating, and wonky headers
dat_2 <- grep("Node Name,RTC_date", dat, invert=TRUE, value=TRUE)

# turn that vector into a text connection for read.csv
dat_3 <- read.csv(textConnection(paste0(dat_2, collapse="\n")),
                  header=FALSE, stringsAsFactors=FALSE)

str(dat_3)
## 'data.frame':    308 obs. of  37 variables:
##  $ V1 : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
##  $ V2 : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
##  $ V3 : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
##  $ V4 : chr  "" "" "" "" ...
## .. more
##  $ V36: chr  "" "" "" "" ...
##  $ V37: chr  "0" "0" "0" "0" ...

# grab the headers
headers <- strsplit(dat[1], ",")[[1]]

# how many of them are there?
length(headers)
## [1] 32

# limit it to the 32 columns you want (Which matches)
dat_4 <- dat_3[,1:32]

# and add the headers
colnames(dat_4) <- headers

str(dat_4)
## 'data.frame':    308 obs. of  32 variables:
##  $ Node Name         : chr  "Node 0" "Node 0" "Node 0" "Node 0" ...
##  $ RTC_date          : chr  "07/31/2014" "07/31/2014" "07/31/2014" "07/31/2014" ...
##  $ RTC_time          : chr  "08:58:18" "08:59:22" "08:59:37" "09:00:06" ...
##  $ N1 Bat (VDC)      : chr  "" "" "" "" ...
##  $ N1 Shinyei (ug/m3): chr  "" "" "0.23" "null" ...
##  $ N1 CC (ppb)       : chr  "" "" "null" "null" ...
##  $ N1 Aeroq (ppm)    : chr  "" "" "null" "null" ...
## ... continues

If you only need the first 32 columns, and you know how many columns there are, you can set the other columns classes to NULL.如果您只需要前 32 列,并且您知道有多少列,则可以将其他列类设置为 NULL。

read.csv("C:\\N0_07312014.CSV", na.string=c("","null","NaN","X"),
      header=T, stringsAsFactors=FALSE,
      colClasses=c(rep("character",32),rep("NULL",10)))

If you do not want to code up each colClass and you like the guesses read.csv then just save that csv and open it again.如果您不想编码每个 colClass 并且您喜欢猜测read.csv那么只需保存该 csv 并再次打开它。

Alternatively, you can skip the header and name the columns yourself and remove the misbehaved rows.或者,您可以跳过标题并自己命名列并删除行为不当的行。

A<-data.frame(read.csv("N0_07312014.CSV",
                        header=F,stringsAsFactors=FALSE,
                        colClasses=c(rep("character",32),rep("NULL",5)),
                        na.string=c("","null","NaN","X")))
Yournames<-as.character(A[1,])
names(A)<-Yournames
yourdata<-unique(A)[-1,]

The code above assumes you do not want any duplicate rows.上面的代码假设您不想要任何重复的行。 You can alternatively remove rows that have the first entry equal to the first column name, but I'll leave that to you.您也可以删除第一个条目等于第一个列名的行,但我将把它留给您。

尝试 read.table() 而不是 read.csv()

I was also facing the same issue.我也面临同样的问题。 Now solved.现在解决了。

Just use header = FALSE只需使用header = FALSE

read.csv("data.csv", header = FALSE) -> mydata

I had the same problem.我有同样的问题。 I opened my data in textfile and double expressions are separated by semicolons, you should replace them with a period我在文本文件中打开了我的数据,双表达式用分号分隔,你应该用句点替换它们

I was having this error that was caused by multiple rows of meta data at the top of the file.我遇到了这个错误,这是由文件顶部的多行元数据引起的。 I was able to use read.csv by doing skip= and skipping those rows.我可以通过执行 skip= 并跳过这些行来使用 read.csv。

data <- read.csv('/blah.csv',skip=3)

对我来说,解决方案是使用 csv2 而不是 csv。

read.csv("file_name.csv", header=F); read.csv("文件名.csv", header=F);

Setting the HEADER to be FALSE will do the job perfectly for you...将 HEADER 设置为 FALSE 将为您完成这项工作......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM