[英]R iterating through 50k dataframe took long
我正在編寫一個簡單的程序,該程序應將一個.tsv文件解析為多個.csv文件。 問題是它花費了如此長的時間(我認為大約5萬行的9分鍾是很糟糕的性能)。 請有人可以看一下我的代碼並告訴我我做錯了什么嗎?
我有一個包含name of participant
name of media
, timestamp
和一些坐標數據的表。 在我的數據中,可以有一個或多個參與者,每個參與者使用2個媒體文件。 我想為每個具體參與者使用的media files
創建csv文件。
例如,我有2位參與者P1
和P2
,每個參與者都處理媒體文件M1
和M2
。 所以我想創建P1_M1.csv
, P1_M2.csv
, P2_M1.csv
, P2_M2.csv
。
數據如下所示:
P1 | M1 | data...
P1 | M1 | data...
...
P1 | M2 | data...
...
P2 | m1 | data...
...
...
這是我的代碼:
data = read.table("./data.tsv", header = T, sep = "\t", stringsAsFactors = F) # load data from tsv
# function for creating csv file
writeData = function(filename, d){
filename = paste("./", filename, ".csv", sep = "")
write.csv(d, file = filename, row.names = F)
}
# initialize auxiliary variables
participantName = ""
mediaName = ""
# initialize empty dataframe
subdata <- data.frame(TimeStamp = numeric(), GazeLeftX = integer(), GazeLeftY = integer(), GazeRightX = integer(), GazeRightY = integer())
# for each row in original data...
for(r in 1:nrow(data))
{
# check if last participant is same as participant on actual row
if(participantName != data[r, 'ParticipantName']){
# check if last participant is not empty (like no participant was processed yet)
if(participantName != ""){
# if it is not than participant and also his work on media file ended so write data to csv
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
# empty auxiliary dataframe and also mediaName
subdata = subdata[0,]
mediaName = ""
}
# we detected new participant so record it into last participant variable
participantName = data[r, 'ParticipantName']
}
# do same checks for media file because there can also change only mediafile and participant can be the same
if(mediaName != data[r, 'MediaName']){
if(mediaName != ""){
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
subdata = subdata[0,]
}
mediaName = data[r, 'MediaName']
}
# in every iteration append actual row into auxilliary dataframe
subdata = rbind(subdata,
TimeStamp = data.frame(data[r, 'EyeTrackerTimestamp'],
GazeLeftX = data[r, 'GazeLeftX'],
GazeLeftY = data[r, 'GazeLeftY'],
GazeRightX = data[r, 'GazeRightX'],
GazeRightY = data[r, 'GazeRightY']))
}
# if there are any data left in auxiliary dataframe, save it to csv
if(nrow(subdata) != 0){
writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
}
您正在尋找?split
。 例如嘗試:
split(data,data[,c("ParticipantName","MediaName")],drop=TRUE)
這將為每個ParticipantName
- MediaName
對創建一個包含data.frame
的list
。 如果要將每個數據幀寫入不同的文件,則可以嘗試以下操作:
res<-split(data,data[,c("ParticipantName","MediaName")],drop=TRUE)
Map(writeData,names(res),res)
其中writeData
是您定義的函數。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.