从R中的粗文本文件中读取表

Question

我有一个文本文件，前几行不需要，然后有一个像这样的表

- 连线 -
| col1 | col2 | col3 col4 col5 |
- 连线 -
| 1 | 2:24:21 PM 3/22/2012 | 0 0 1 |
| 2 | 2:24:21 PM 3/22/2012 | 1·0 |

Col1，Col2由|分隔 但col3，col4和col5只是用空格分隔。
数据类型应该像col2 date和col3,4,5那样保持数字？
第2行，Col4是点，因此应该读取NA NA
连字符线的起点和终点 - - -

题：
1.我可以使用扫描，但如何避免阅读“|” 和“ - ”？
2.我可以跳过前几行，但除了前几行之外，如何跳过说第50行。

Answer 1

您可以按原样将其读取为表格，然后将列拆分并重新组合。

txt <- "| col1 | col2 | col3 col4 col5 |
| 1 | 2:24:21 PM 3/22/2012 | 0 0 1 |
| 2 | 2:24:21 PM 3/22/2012 | 1 · 0 |"

x <- read.table(text = txt, sep = "|", header = TRUE, stringsAsFactors = FALSE)

## drop unnecessary columns from the original sep split
x <- x[,-c(1,ncol(x))]

## split the desired column by the spaces, result is a character matrix
## including an unnecessary first column
split.col3 <- do.call("rbind", strsplit(x[,3], " "))

## bind to the original, dropping the unneeded columns
cbind(x[,-3], split.col3[,-1])
  col1                   col2 1 2 3
1    1  2:24:21 PM 3/22/2012  0 0 1
2    2  2:24:21 PM 3/22/2012  1 · 0

我避免引用原始列名，因为你说你想跳过这些行。 只需将header = FALSE和skip = 50添加到read.table调用，然后添加任何有意义的列名。

此外，您可以删除“。” 必要时从列中转换为日期时间格式或数字。 如果您colClasses了解它们，请在read.table中使用colClasses 。 我有理由把它分解为许多步骤，而不是试图用一个读取函数来完成所有这些步骤。

Answer 2

这可以分三步完成。 （1）使用"|"读入文件全部 作为分隔符，（2）创建一个只包含三列（包含为一个）的新文件，（3）然后使用空格分隔符读回这些列。 以下代码可以帮助您完成大部分工作。 可能需要的更改：文件名， V4列名，以及导航到正确的目录（ getwd/setwd ）。

a <- read.delim("a.txt", FALSE, sep="|")
write.table(a$V4, file="b.txt", quote=FALSE, row.names=FALSE, col.names=FALSE)
b <- read.delim("b.txt", FALSE, sep=" ")

合并a和b的相应列，然后就完成了。

从R中的粗文本文件中读取表

问题描述

2 个解决方案

解决方案1
2 已采纳 2012-03-28 21:49:46

解决方案2
1 2012-03-28 21:51:04

从R中的粗文本文件中读取表

问题描述

2 个解决方案

解决方案1 2 已采纳 2012-03-28 21:49:46

解决方案2 1 2012-03-28 21:51:04

解决方案1
2 已采纳 2012-03-28 21:49:46

解决方案2
1 2012-03-28 21:51:04