簡體   English   中英

遍歷50k數據幀花費了很長時間

[英]R iterating through 50k dataframe took long

我正在編寫一個簡單的程序,該程序應將一個.tsv文件解析為多個.csv文件。 問題是它花費了如此長的時間(我認為大約5萬行的9分鍾是很糟糕的性能)。 請有人可以看一下我的代碼並告訴我我做錯了什么嗎?

我有一個包含name of participant name of mediatimestamp和一些坐標數據的表。 在我的數據中,可以有一個或多個參與者,每個參與者使用2個媒體文件。 我想為每個具體參與者使用的media files創建csv文件。

例如,我有2位參與者P1P2 ,每個參與者都處理媒體文件M1M2 所以我想創建P1_M1.csvP1_M2.csvP2_M1.csvP2_M2.csv

數據如下所示:

P1 | M1 | data...
P1 | M1 | data...
...
P1 | M2 | data...
...
P2 | m1 | data...
...
...

這是我的代碼:

data = read.table("./data.tsv", header = T, sep = "\t", stringsAsFactors = F) # load data from tsv

# function for creating csv file
writeData = function(filename, d){
  filename = paste("./", filename, ".csv", sep = "")
  write.csv(d, file = filename, row.names = F)
}

# initialize auxiliary variables
participantName = ""
mediaName = ""
# initialize empty dataframe
subdata <- data.frame(TimeStamp = numeric(), GazeLeftX = integer(), GazeLeftY = integer(), GazeRightX = integer(), GazeRightY = integer())

# for each row in original data...
for(r in 1:nrow(data))
{
  # check if last participant is same as participant on actual row
  if(participantName != data[r, 'ParticipantName']){
    # check if last participant is not empty (like no participant was processed yet)
    if(participantName != ""){
      # if it is not than participant and also his work on media file ended so write data to csv
      writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
      # empty auxiliary dataframe and also mediaName
      subdata = subdata[0,]
      mediaName = ""
    }
    # we detected new participant so record it into last participant variable
    participantName = data[r, 'ParticipantName']
  }
  # do same checks for media file because there can also change only mediafile and participant can be the same
  if(mediaName != data[r, 'MediaName']){
    if(mediaName != ""){
      writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
      subdata = subdata[0,]
    }
    mediaName = data[r, 'MediaName']  
  }
  # in every iteration append actual row into auxilliary dataframe
  subdata = rbind(subdata,
                  TimeStamp = data.frame(data[r, 'EyeTrackerTimestamp'],
                  GazeLeftX = data[r, 'GazeLeftX'],
                  GazeLeftY = data[r, 'GazeLeftY'],
                  GazeRightX = data[r, 'GazeRightX'],
                  GazeRightY = data[r, 'GazeRightY']))
}
# if there are any data left in auxiliary dataframe, save it to csv
if(nrow(subdata) != 0){
  writeData(filename = paste(participantName,"_",mediaName, sep = ""), d = subdata)
}

您正在尋找?split 例如嘗試:

split(data,data[,c("ParticipantName","MediaName")],drop=TRUE)

這將為每個ParticipantName - MediaName對創建一個包含data.framelist 如果要將每個數據幀寫入不同的文件,則可以嘗試以下操作:

res<-split(data,data[,c("ParticipantName","MediaName")],drop=TRUE)
Map(writeData,names(res),res)

其中writeData是您定義的函數。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM