按指定的列长度将ASCII数据读入R

Question

I have a large ASCII file that looks something like this if you open it in a text editor: 我有一个很大的ASCII文件，如果在文本编辑器中将其打开，则看起来像这样：

11112223423 4434 555534 5533 54534 5354 5532 434 4 43434 23424234 34 4534 34453 345345345 345344 344 43423453453 43444 99098 234090 4354550 345399 43453 9900 4 11112223423 4434 555534 5533 54534 5354 5532 434 4 43434 23424234 34 4534 34453 345345345 345344 344 43423453453 43444 99098 234090 4354550 345399 43453 9900 4

I have been given a mapping of the columns. 我得到了列的映射。 For example: The first variable sits in columns 1-9. 例如：第一个变量位于1-9列中。 The second column sits in 104-105. 第二列位于104-105。 And so on. 等等。

Is there an easy way to read this type of data into R so that I end up with a data.frame? 是否有一种简单的方法可以将这种类型的数据读取到R中，从而得到一个data.frame？

Thanks for the help! 谢谢您的帮助！

Answer 1

I've used the standard read.fwf() for this kind of thing. 我已经将标准的read.fwf()用于这种情况。

I also like read_fwf() from the readr package. 我也喜欢readr包中的read_fwf() 。 For example: 例如：

#create some dummy fixed-width-field data
fixed_width_data <- "line1  field1 datafield2 dataetc\nline2  field1 datafield2 dataetc\n"

#specify the data columns
field_info <- fwf_widths(c(7, 11, 11, 3), col_names=c("line_number", "field1", "field2", "fieldn"))

#read it in
parsed <- read_fwf(fixed_width_data,  field_info)

To specify start/end positions for the columns of data, you can use fwf_positions() instead of fwf_widths() : 要指定数据列的开始/结束位置，可以使用fwf_positions()代替fwf_widths() ：

#create some dummy fixed-width-field data
fixed_width_data2 <- "line1  field1 datafield2 dataTEXT TO SKIPetc\nline2  field1 datafield2 dataTEXT TO SKIPetc\n"

#specify the data columns using start and end positions
field_info2 <- fwf_positions(start=c(1, 8, 19, 42), end=c(5, 18, 29, 44), col_names=c("line_number", "field1", "field2", "fieldn"))

#read it in
parsed2 <- read_fwf(fixed_width_data2,  field_info2)

Answer 2

You can do this in base R using read.fwf (fixed width fields) I wrote a file with your single line of input and got: 您可以在base R中使用read.fwf （固定宽度字段）执行此操作，我用单行输入编写了一个文件，并得到：

FullFile = read.fwf("Test.txt", widths=c(9,94,2))
Interesting = FullFile[,c(1,3)]
Interesting
         V1 V3
1 111122234 42

Note that I am reading the columns to skip into a variable and then just discarding that variable. 请注意，我正在阅读各列以跳入一个变量，然后仅丢弃该变量。

按指定的列长度将ASCII数据读入R

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-07-14 14:50:12

解决方案2
0 2017-07-14 14:49:29

按指定的列长度将ASCII数据读入R

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-07-14 14:50:12

解决方案2 0 2017-07-14 14:49:29

解决方案1
2 已采纳 2017-07-14 14:50:12

解决方案2
0 2017-07-14 14:49:29