将表读入R

Question

I want to read a text file into R, but I got a problem that the first column are mixed with the column names and the first column numbers. 我想将文本文件读取到R中，但是出现一个问题，第一列与列名和第一列编号混合在一起。

Data text file 资料文字档

revenues       4118000000.0, 4315000000.0, 4512000000.0, 4709000000.0, 4906000000.0, 5103000000.0
cost_of_revenue-1595852945.4985902, -1651829192.2662954, -1705945706.6237037, -1758202488.5708148, -1808599538.1076286, -1857136855.234145
gross_profit   2522147054.5014095, 2663170807.7337046, 2806054293.376296, 2950797511.429185, 3097400461.892371, 3245863144.765855

R Code: data.predicted_values = read.table("predicted_values.txt", sep=",") R代码：data.predicted_values = read.table（“ predicted_values.txt”，sep =“，”）

Output: 输出：

                                  V1          V2          V3          V4          V5          V6
1        revenues       4118000000.0  4315000000  4512000000  4709000000  4906000000  5103000000
2 cost_of_revenue-1595852945.4985902 -1651829192 -1705945707 -1758202489 -1808599538 -1857136855
3  gross_profit   2522147054.5014095  2663170808  2806054293  2950797511  3097400462  3245863145

How can I split the first column into two parts? 如何将第一列分为两部分？ I mean I want the first column V1 is revenues,cost_of_revenue, gross_profit. 我的意思是我希望第一列V1是收入，cost_of_revenue，gross_profit。 V2 is 4118000000.0,-1595852945.4985902,2522147054.5014095. V2是4118000000.0，-1595852945.4985902,2522147054.5014095。 And so on and so forth. 等等等等。

Answer 1

Since you have no commas btwn the rownames and the values you need to add them back in: 由于您没有逗号btwn行名和值，因此需要将它们添加回去：

txt <- "revenues       4118000000.0, 4315000000.0, 4512000000.0, 4709000000.0, 4906000000.0, 5103000000.0
cost_of_revenue-1595852945.4985902, -1651829192.2662954, -1705945706.6237037, -1758202488.5708148, -1808599538.1076286, -1857136855.234145
gross_profit   2522147054.5014095, 2663170807.7337046, 2806054293.376296, 2950797511.429185, 3097400461.892371, 3245863144.765855"

Lines <- readLines( textConnection(txt) ) 
  # replace textConnection(.)  with  `file = "predicted_values.txt"`
res <- read.csv( text=sub( "(^[[:alpha:][:punct:]]+)(\\s|-)" ,
                                               "\\1,", Lines) ,
          header=FALSE, row.names=1 )
res

The decimal fractions may not print but they are there. 十进制小数可能不会打印，但是它们在那里。

Answer 2

This is along the same lines of thinking as @DWin's, but accounts for the negative values in the second row. 这与@DWin的思路相同，但是在第二行中给出了负值。

TEXT <- readLines("predicted_values.txt")
A <- gregexpr("[A-Za-z_]+", TEXT)
B <- read.table(text = regmatches(TEXT, A, invert = TRUE)[[1]], sep = ",")
C <- cbind(FirstCol = regmatches(TEXT, A)[[1]], B)
C
#          FirstCol          V1          V2          V3          V4          V5          V6
# 1        revenues  4118000000  4315000000  4512000000  4709000000  4906000000  5103000000
# 2 cost_of_revenue -1595852945 -1651829192 -1705945707 -1758202489 -1808599538 -1857136855
# 3    gross_profit  2522147055  2663170808  2806054293  2950797511  3097400462  3245863145

Answer 3

You want the row.names argument of read.table . 您需要read.table的row.names参数。 Then you can simply transpose your data: 然后，您可以简单地转置数据：

data.predicted_values = read.table("predicted_values.txt", sep=",", row.names=1)
data.predicted_values <- t(data.predicted_values)

将表读入R

问题描述

3 个解决方案

解决方案1
1 2013-11-21 03:27:10

解决方案2
1 已采纳 2013-11-21 04:37:34

解决方案3
0 2013-11-21 03:27:33

将表读入R

问题描述

3 个解决方案

解决方案1 1 2013-11-21 03:27:10

解决方案2 1 已采纳 2013-11-21 04:37:34

解决方案3 0 2013-11-21 03:27:33

解决方案1
1 2013-11-21 03:27:10

解决方案2
1 已采纳 2013-11-21 04:37:34

解决方案3
0 2013-11-21 03:27:33