在 R 中读取具有不同列长和空格的数据

Question

I have data that looks something like this:我有看起来像这样的数据：

category 2011 2012 2013 2104
word word word 15,000.11 1,000.15 13,001.50 20,000,001.52
word 2,000.120 400,000.00 57,000.523 402,000,111
word word 4,000.120 455,000.02 57,600.87 403,000,111.18
word 2,056.120 678,000.00 670,000.523 402,009,111.65

It is in a .csv file.它位于 .csv 文件中。 I want to read it in so it separates into columns but they are all different lengths so I am not sure how.我想读进去，所以它分成几列，但它们的长度都不同，所以我不确定如何。 I know I can separate by spaces but some of the words in the first column have spaces between them.我知道我可以用空格分隔，但第一列中的一些单词之间有空格。

category         2011      2012         2013         2104
word word word  15,000.11  1,000.15    13,001.50    20,000,001.52
word            2,000.120  400,000.00  57,000.523   402,000,111
word word       4,000.120  455,000.02  57,600.87    403,000,111.18
word            2,056.120  678,000.00  670,000.523  402,009,111.65

I apologize if I am not asking this correctly.如果我没有正确提出这个问题，我深表歉意。 Thanks for your help!谢谢你的帮助！

Answer 1

We can make the delimiter with sub after reading the dataset with readLines我们可以在用readLines读取数据集后用sub做分隔符

lines[-1] <- sub("^([A-Za-z ]+)(?=\\s[0-9])", "'\\1'", lines[-1], perl = TRUE)
read.table(textConnection(lines), header = TRUE, check.names = FALSE)
#      category      2011       2012        2013           2104
#1 word word word 15,000.11   1,000.15   13,001.50  20,000,001.52
#2           word 2,000.120 400,000.00  57,000.523    402,000,111
#3      word word 4,000.120 455,000.02   57,600.87 403,000,111.18
#4           word 2,056.120 678,000.00 670,000.523 402,009,111.65

data数据

lines <- readLines('file.csv')

在 R 中读取具有不同列长和空格的数据

问题描述

1 个解决方案

解决方案1
0 2020-03-27 23:03:18

data数据

在 R 中读取具有不同列长和空格的数据

问题描述

1 个解决方案

解决方案1 0 2020-03-27 23:03:18

data数据

解决方案1
0 2020-03-27 23:03:18