簡體   English   中英

R 加載到數據集的特定部分

[英]R loading in specific part of a dataset

我曾經使用以下代碼將我的數據加載到單個數據幀中:

filenames = list.files(path=path, pattern=".txt") #maakt een vector 'filenames' met alle .txt files in de betreffende folder

## 2.2 Data inladen

colnamesfull = c("time","v","a","t1","t2","t3","t4","t5","t6","t7","t8")
for(i in filenames){
  filepath = file.path(path, paste(i, sep=""))
  assign(i, read.table(filepath,
                       skip= 20, 
                       col.names= colnamesfull, 
                       sep=","))}

我現在已經發現將這些 txt 文件直接加載到列表中更有效,並且已經設法使用以下代碼來做到這一點:

filenames = list.files(path=path, pattern=".txt")
fn <- paste(path,filenames,sep="/")
mylist <- lapply(fn, read.table, stringsAsFactors=FALSE)

但是,我仍然想做我以前的方法所做的修改。 即:

  • 刪除前 20 行,它們是文本而不是數據
  • 將名稱添加到列中,如 colnamesfull

我該怎么做呢? 我嘗試在 lapply 中填寫 read.table,但這不起作用。

加載到 mylist 時,數據框如下所示:

c("description=Peilopzet_Twentekanaal", "reference_heater_voltage=11", 
"pulsetime=25.0", "measuretime=300", "filebasename=SD07_TWK_", 
"measureinterval=1800", "nosleep=0", "sampletime=1", "IP=?.?.?.?", 
"Battery=15.86V", "Firmware=91", "Bootreason=BootReason(poweron=False,", 
"user=False,", "rtc=True,", "timeout=False)", "Sensorfirmware=Rev:", 
"98", "Datetime=2019-08-22", "00:30:04", "DatetimeFromNTP=True", 
"Heatervoltage=15.860000000000001", "Heaterduty=0.4810375781785452", 
"Compass=-317", "639", "42", "ApplicationVersion=trunk-r47", 
"ApplicationDate=2017-12-06", "11:17:33", "+0100", "2.0,14.820,1.500,14.61,14.63,14.63,14.65,14.65,14.63,14.64,14.60", 
"3.9,14.804,1.476,14.61,14.62,14.63,14.65,14.65,14.63,14.64,14.60", 
"5.8,14.820,1.500,14.61,14.62,14.63,14.65,14.65,14.63,14.64,14.60"
)

我已經設法使用skip,但這不起作用,因為之后我無法讓col.names 工作。 數據集看起來像這樣,使用如下跳過:

filenames = list.files(path=path, pattern=".txt")
fn <- paste(path,filenames,sep="/")
mylist <- lapply(fn, read.table, stringsAsFactors=FALSE, skip=21)

數據集的圖像

> mylist[[1]]
                                                                    V1
1     2.0,14.820,1.500,14.61,14.63,14.63,14.65,14.65,14.63,14.64,14.60
2     3.9,14.804,1.476,14.61,14.62,14.63,14.65,14.65,14.63,14.64,14.60
3     5.8,14.820,1.500,14.61,14.62,14.63,14.65,14.65,14.63,14.64,14.60

當我通過 dput() 放置數據集時,V1 沒有出現

> dput(test[0:5,])
c("2.0,14.808,1.478,14.63,14.64,14.65,14.66,14.67,14.65,14.65,14.62", 
"3.9,14.808,1.472,14.63,14.64,14.65,14.66,14.67,14.65,14.65,14.62", 
"5.9,14.816,1.491,14.63,14.64,14.65,14.66,14.67,14.65,14.65,14.62", 
"7.8,14.816,1.490,14.63,14.64,14.65,14.66,14.67,14.65,14.65,14.62", 
"9.7,14.808,1.470,14.62,14.64,14.65,14.66,14.67,14.65,14.65,14.62"

我如何擺脫把這一切搞砸的“V1”?

正如羅蘭所說:

mylist <- lapply(fn, read.table, stringsAsFactors=FALSE, skip= 20, col.names= colnamesfull, sep=",")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM