r：讀取數據集，其中每個觀察分為2行？

Question

我正在嘗試讀取一個以空格分隔的文件，其中每個觀察點都被換行符中斷。 有沒有辦法對值進行read.table或fread掃描，直到整行滿？

標題和前兩行數據集如下所示：

   tsales sales margin nown nfull npart naux hoursw hourspw inv1 inv2 ssize start
       750000   4411.765         41          1          1          1     1.5357
           76   16.75596   17166.67   27177.04        170         41
      1926395   4280.878         39          2          2          3     1.5357
          192   22.49376   17166.67   27177.04        450         39

Answer 1

由於每行最終數據在輸入中被分成完整的2行，您可以嘗試這樣做 -

#read file
txt <- readLines("test.txt")

#extract header and remove it from data
df_header <- strsplit(txt[1], split=" ")[[1]]
txt <- txt[-1]

#merge every 2 subseqeunt lines into one to form a row of final dataframe
idx <- seq(1, length(txt), by=2)
txt[idx] <- paste(txt[idx], txt[idx+1])
txt <- txt[-(idx+1)]

#final data
df <- read.table(text=txt, col.names=df_header)

輸出是：

   tsales    sales margin nown nfull npart   naux hoursw  hourspw     inv1     inv2 ssize start
1  750000 4411.765     41    1     1     1 1.5357     76 16.75596 17166.67 27177.04   170    41
2 1926395 4280.878     39    2     2     3 1.5357    192 22.49376 17166.67 27177.04   450    39

示例數據： test.txt包含

tsales sales margin nown nfull npart naux hoursw hourspw inv1 inv2 ssize start
750000   4411.765         41          1          1          1     1.5357
76   16.75596   17166.67   27177.04        170         41
1926395   4280.878         39          2          2          3     1.5357
192   22.49376   17166.67   27177.04        450         39

Answer 2

我正在讀你的樣本數據，看起來像這樣......

   tsales      sales   margin     nown nfull npart   naux hoursw hourspw inv1 inv2 ssize start
1  750000 4411.76500    41.00     1.00     1     1 1.5357     NA      NA   NA   NA    NA    NA
2      76   16.75596 17166.67 27177.04   170    41     NA     NA      NA   NA   NA    NA    NA
3 1926395 4280.87800    39.00     2.00     2     3 1.5357     NA      NA   NA   NA    NA    NA
4     192   22.49376 17166.67 27177.04   450    39     NA     NA      NA   NA   NA    NA    NA

因為它們是替代品並且列數較少，所以我們可以輕松編碼

Data=read.csv("mydata.csv")
firstData=Data[!is.na(Data$naux),]
secondData=Data[is.na(Data$naux),]
firstData$hoursw=secondData$tsales
firstData$hourspw=secondData$sales
firstData$inv1=secondData$margin
firstData$inv2=secondData$nown
firstData$ssize=secondData$nfull
firstData$start=secondData$npart
Data=firstData

數據分為2.奇數行和偶數行。 然后用偶數roes數據中提供的正確值替換奇數行。 希望這對你有所幫助！

最終的輸出是

> firstData
   tsales    sales margin nown nfull npart   naux hoursw  hourspw     inv1     inv2 ssize start
1  750000 4411.765     41    1     1     1 1.5357     76 16.75596 17166.67 27177.04   170    41
3 1926395 4280.878     39    2     2     3 1.5357    192 22.49376 17166.67 27177.04   450    39

> secondData
  tsales    sales   margin     nown nfull npart naux hoursw hourspw inv1 inv2 ssize start
2     76 16.75596 17166.67 27177.04   170    41   NA     NA      NA   NA   NA    NA    NA
4    192 22.49376 17166.67 27177.04   450    39   NA     NA      NA   NA   NA    NA    NA

> Data
   tsales    sales margin nown nfull npart   naux hoursw  hourspw     inv1     inv2 ssize start
1  750000 4411.765     41    1     1     1 1.5357     76 16.75596 17166.67 27177.04   170    41
3 1926395 4280.878     39    2     2     3 1.5357    192 22.49376 17166.67 27177.04   450    39

Answer 3

這是一個data.table解決方案（我已將您的示例復制到文件dfTest.txt ）。 查看評論以獲得解釋：

library(data.table)
#fill=TRUE fills empty cols due to irregular structure with NAs
dt=fread("dfTest.txt",header = TRUE,sep=" ",fill=TRUE)
#cols to fix
selCols=c("hoursw","hourspw","inv1","inv2","ssize","start")
#cols from which to read
otherCols=colnames(dt)[seq_along(selCols)]
#fill missing cols from leading rows and select every 2nd row afterwards
dt[,c(selCols):=shift(.SD,n=1L,type="lead"),
    .SDcols=otherCols][seq(1,nrow(dt),2),]

r：讀取數據集，其中每個觀察分為2行？

問題描述

3 個解決方案

解決方案1
1 已采納 2018-05-17 08:02:59

解決方案2
1 2018-05-17 08:17:54

解決方案3
1 2018-05-17 09:36:34

r：讀取數據集，其中每個觀察分為2行？

問題描述

3 個解決方案

解決方案1 1 已采納 2018-05-17 08:02:59

解決方案2 1 2018-05-17 08:17:54

解決方案3 1 2018-05-17 09:36:34

解決方案1
1 已采納 2018-05-17 08:02:59

解決方案2
1 2018-05-17 08:17:54

解決方案3
1 2018-05-17 09:36:34