简体   繁体   English

无法使用csv读取文本文件?

[英]Cannot read text file using csv?

I have a text data separated ny "commas" ie",". 我有一个文本数据,其中的纽约州“逗号”即“,”。 The sample of the data is given below (first row indicates the column names): 数据示例如下(第一行表示列名):

userID,appName,startTime,endTime,endResult
chhieut,gms.mos.test,2012-07-01 02:47:16,2012-07-01 02:47:46,1
chhieut,gms.mos.test,2012-07-01 03:11:46,2012-07-01 03:12:25,2
chhieut,gms.mos.test,2012-07-01 03:13:36,2012-07-01 03:14:03,2
chhieut,gms.mos.test,2012-07-01 03:18:26,2012-07-01 03:18:58,2
chhieut,gms.mos.test,2012-07-01 04:10:36,2012-07-01 04:10:54,2
chhieut,gms.mos.test,2012-07-01 04:38:26,2012-07-01 04:38:48,2
chhieut,gms.mos.test,2012-07-01 04:48:56,2012-07-01 04:49:04,3
chhieut,gms.mos.test,2012-07-01 05:49:46,2012-07-01 05:50:14,2
chhieut,gms.mos.test,2012-07-01 06:19:07,2012-07-01 06:19:25,2
chhieut,gms.mos.test,2012-07-01 07:09:17,2012-07-01 07:09:47,2

I am using the following syntax: 我正在使用以下语法:

appsession <- read.table("C:/.../AppSession.txt", sep = ",", 
  col.names = c("userID","appName","startTime","endTime","endResult"), 
  fill = FALSE, strip.white = TRUE)

I am getting this error: 我收到此错误:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 1 did not have 5 elements

I think you need to use skip = 2 if you have a blank line and are planning on using 'col.names' without using header=TRUE . 我认为如果您有空白行并且计划使用'col.names'而不使用header=TRUE ,则需要使用skip = 2 At the moment your code works (well sort of works anyway) with a simple text read" 目前,您的代码可以通过简单的文本读取(仍然可以正常工作)

> txt <- "userID,appName,startTime,endTime,endResult
+ chhieut,gms.mos.test,2012-07-01 02:47:16,2012-07-01 02:47:46,1
+ chhieut,gms.mos.test,2012-07-01 03:11:46,2012-07-01 03:12:25,2
+ chhieut,gms.mos.test,2012-07-01 03:13:36,2012-07-01 03:14:03,2
+ chhieut,gms.mos.test,2012-07-01 03:18:26,2012-07-01 03:18:58,2
+ chhieut,gms.mos.test,2012-07-01 04:10:36,2012-07-01 04:10:54,2
+ chhieut,gms.mos.test,2012-07-01 04:38:26,2012-07-01 04:38:48,2
+ chhieut,gms.mos.test,2012-07-01 04:48:56,2012-07-01 04:49:04,3
+ chhieut,gms.mos.test,2012-07-01 05:49:46,2012-07-01 05:50:14,2
+ chhieut,gms.mos.test,2012-07-01 06:19:07,2012-07-01 06:19:25,2
+ chhieut,gms.mos.test,2012-07-01 07:09:17,2012-07-01 07:09:47,2
+ "
> appsession <- read.table(text=txt, sep = ",", 
+   col.names = c("userID","appName","startTime","endTime","endResult"), 
+   fill = FALSE, strip.white = TRUE)
> 
> appsession
    userID      appName           startTime             endTime endResult
1   userID      appName           startTime             endTime endResult
2  chhieut gms.mos.test 2012-07-01 02:47:16 2012-07-01 02:47:46         1
3  chhieut gms.mos.test 2012-07-01 03:11:46 2012-07-01 03:12:25         2
4  chhieut gms.mos.test 2012-07-01 03:13:36 2012-07-01 03:14:03         2
5  chhieut gms.mos.test 2012-07-01 03:18:26 2012-07-01 03:18:58         2
6  chhieut gms.mos.test 2012-07-01 04:10:36 2012-07-01 04:10:54         2
7  chhieut gms.mos.test 2012-07-01 04:38:26 2012-07-01 04:38:48         2
8  chhieut gms.mos.test 2012-07-01 04:48:56 2012-07-01 04:49:04         3
9  chhieut gms.mos.test 2012-07-01 05:49:46 2012-07-01 05:50:14         2
10 chhieut gms.mos.test 2012-07-01 06:19:07 2012-07-01 06:19:25         2
11 chhieut gms.mos.test 2012-07-01 07:09:17 2012-07-01 07:09:47         2

You should either use header or skip the header row (plus skipping any blank rows.) One way to see how many rows are blank is to look at the output of countfields( ..., sep=",") . 您应该使用标头或跳过标头行(以及跳过任何空白行。)一种查看多少行为空白的方法是查看countfields( ..., sep=",") Another way to see what the read.* and scan functions are "seeing" would be to execute this code (with suitable replacement of the ellipsis): 查看read.*scan函数“可见”的另一种方法是执行以下代码(适当替换省略号):

appLines <- readLines("C:/.../AppSession.txt")
appLines[1:5] # will display the first 5 lines from that file 
              # with no attempt to deal with any separators.

You will need to provide a link to your actual data set, since the data you have provided works fine: 您将需要提供一个指向实际数据集的链接,因为您提供的数据可以正常工作:

d = read.csv(textConnection("userID,appName,startTime,endTime,endResult
chhieut,gms.mos.test,2012-07-01 02:47:16,2012-07-01 02:47:46,1
chhieut,gms.mos.test,2012-07-01 03:11:46,2012-07-01 03:12:25,2
chhieut,gms.mos.test,2012-07-01 03:13:36,2012-07-01 03:14:03,2
chhieut,gms.mos.test,2012-07-01 03:18:26,2012-07-01 03:18:58,2
chhieut,gms.mos.test,2012-07-01 04:10:36,2012-07-01 04:10:54,2
chhieut,gms.mos.test,2012-07-01 04:38:26,2012-07-01 04:38:48,2
chhieut,gms.mos.test,2012-07-01 04:48:56,2012-07-01 04:49:04,3
chhieut,gms.mos.test,2012-07-01 05:49:46,2012-07-01 05:50:14,2
chhieut,gms.mos.test,2012-07-01 06:19:07,2012-07-01 06:19:25,2
chhieut,gms.mos.test,2012-07-01 07:09:17,2012-07-01 07:09:47,2"), header=TRUE)

Quick check: 快速检查:

R> head(d, 1)
   userID      appName           startTime             endTime endResult
1 chhieut gms.mos.test 2012-07-01 02:47:16 2012-07-01 02:47:46         1
R> dim(d)
[1] 10  5

Make sure you don't have blank lines in your actual file - this will really stuff things up. 确保您的实际文件中没有空行-这确实会塞满东西。

Using suitably edited version of your data (ie removing all the blank lines!), this can be loaded into R easily via read.csv() . 使用适当编辑的数据版本(即删除所有空白行!),可以通过read.csv()轻松加载到R中。 Note here I'm using a text connection containing the data to avoid writing your data to a file. 请注意,这里我使用的是包含数据的文本连接,以避免将数据写入文件。 Just replace con with your file name in the read.csv() . 只需用read.csv()中的文件名替换con read.csv()

con <- textConnection("userID,appName,startTime,endTime,endResult
chhieut,gms.mos.test,2012-07-01 02:47:16,2012-07-01 02:47:46,1
chhieut,gms.mos.test,2012-07-01 03:11:46,2012-07-01 03:12:25,2
chhieut,gms.mos.test,2012-07-01 03:13:36,2012-07-01 03:14:03,2
chhieut,gms.mos.test,2012-07-01 03:18:26,2012-07-01 03:18:58,2
chhieut,gms.mos.test,2012-07-01 04:10:36,2012-07-01 04:10:54,2
chhieut,gms.mos.test,2012-07-01 04:38:26,2012-07-01 04:38:48,2
chhieut,gms.mos.test,2012-07-01 04:48:56,2012-07-01 04:49:04,3
chhieut,gms.mos.test,2012-07-01 05:49:46,2012-07-01 05:50:14,2
chhieut,gms.mos.test,2012-07-01 06:19:07,2012-07-01 06:19:25,2
chhieut,gms.mos.test,2012-07-01 07:09:17,2012-07-01 07:09:47,2
")

dat <- read.csv(con,
                colClasses = c(rep("character", 2), rep("POSIXct", 2),
                               "numeric"))
close(con) ## closing connection, not needed with a file

Also note that by specifying the colclasses argument we tell R what the data are before reading them in which saves some formatting later, especially with the DateTime data. 还要注意,通过指定colclasses参数,我们可以在读取数据之前告诉R数据是什么,这将在以后保存某些格式,尤其是DateTime数据。 We can do this here because you have the DateTime variables stored in the correct format. 我们可以在此处执行此操作,因为您以正确的格式存储了DateTime变量。

R> head(dat)
   userID      appName           startTime             endTime endResult
1 chhieut gms.mos.test 2012-07-01 02:47:16 2012-07-01 02:47:46         1
2 chhieut gms.mos.test 2012-07-01 03:11:46 2012-07-01 03:12:25         2
3 chhieut gms.mos.test 2012-07-01 03:13:36 2012-07-01 03:14:03         2
4 chhieut gms.mos.test 2012-07-01 03:18:26 2012-07-01 03:18:58         2
5 chhieut gms.mos.test 2012-07-01 04:10:36 2012-07-01 04:10:54         2
6 chhieut gms.mos.test 2012-07-01 04:38:26 2012-07-01 04:38:48         2
R> str(dat)
'data.frame':   10 obs. of  5 variables:
 $ userID   : chr  "chhieut" "chhieut" "chhieut" "chhieut" ...
 $ appName  : chr  "gms.mos.test" "gms.mos.test" "gms.mos.test" "gms.mos.test" ...
 $ startTime: POSIXct, format: "2012-07-01 02:47:16" "2012-07-01 03:11:46" ...
 $ endTime  : POSIXct, format: "2012-07-01 02:47:46" "2012-07-01 03:12:25" ...
 $ endResult: num  1 2 2 2 2 2 3 2 2 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM