簡體   English   中英

Python-迭代兩個CSV文件中的每一行並比較時間戳記值

[英]Python - Iterate each line in two CSV files and compare timestamp values

我有以下兩個CSV文件:

CSV文件1:

Range1,2018-05-17 01:50:17+0000,2018-05-17 02:00:17+0000
Range2,2018-05-17 01:50:17+0000,2018-05-17 04:00:17+0000
Range3,2018-05-17 01:50:17+0000,2018-05-17 08:00:17+0000

CSV File2:

TimeStamp1,2018-05-17 01:59:17+0000
TimeStamp2,2018-05-17 03:59:17+0000
TimeStamp3,2018-05-17 07:59:17+0000

我想遍歷File1中的每個Range,並確定哪個TimeStamp屬於所比較的Range。 例如,我的Python腳本的輸出將顯示:

輸出:

TimeStamp1 falls within Range1
TimeStamp1, TimeStamp2 falls within Range2
TimeStamp1, TimeStamp2, TimeStamp3 falls within Range3

我開始寫類似這樣的內容,但是在獲取輸出以及if語句正確地最初遍歷File1的File2中的所有行時遇到問題,然后在File1中的下一行重復,並在File2中再次重復所有行。 先感謝您。

    import csv 

    with open('File1', 'rb') as range, open('File2', 'rb') as timeStamp: 

    range_reader = csv.reader(range, quotechar='"')
    timeStamp_reader = csv.reader(timeStamp, quotechar='"')
    for range_row in range_reader:
      print range_row[2]
      print range_row[3]
      for timeStamp_row in timeStamp_reader:
        print timeStamp_row[2]
        if range_row[2] <= timeStamp_row[2] and range_row[3] >= timeStamp_row[2]
          print " %s falls within %s "% (timeStamp_row[1], range_row[1])

您的代碼中幾乎沒有錯誤。 首先,您搞砸了索引。 索引從0開始。因此,從所有索引中減去1。

您不能重復讀取文件,因為讀取器會打到它的結尾,然后它將不再讀取任何東西,因為它在結尾。 因此,對於第二個循環,您需要將其閱讀器重置為重新開始。 通過設置搜索可以輕松完成。

import csv 
with open('File1', 'r') as ranges, open('File2', 'r') as timeStamp: 
  range_reader = csv.reader(ranges, quotechar='"')
  timeStamp_reader = csv.reader(timeStamp, quotechar='"')
  rangeArray = {}
  for range_row in range_reader:
    print("%s / %s" % ( range_row[1], range_row[2])) # This looks better, and gives more info than just printing both timestamps on each line
    timeStamp.seek(0) # This will set position of cursor in timeStamp back to start, so it can iterate repeatedly
    rangeArray[range_row[0]] = []
    for timeStamp_row in timeStamp_reader:
      if range_row[1] <= timeStamp_row[1] and range_row[2] >= timeStamp_row[1]:
        rangeArray[range_row[0]].append(timeStamp_row[0])
        print (" %s falls within %s " % (timeStamp_row[0], range_row[0]))

print("\n\n")

# Desired Output:
for key in rangeArray:
  print("%s falls within %s" % (', '.join([str(x) for x in rangeArray[key]]), key))

這給出了這樣的輸出:

2018-05-17 01:50:17+0000 / 2018-05-17 02:00:17+0000
 TimeStamp1 falls within Range1
2018-05-17 01:50:17+0000 / 2018-05-17 04:00:17+0000
 TimeStamp1 falls within Range2
 TimeStamp2 falls within Range2
2018-05-17 01:50:17+0000 / 2018-05-17 08:00:17+0000
 TimeStamp1 falls within Range3
 TimeStamp2 falls within Range3
 TimeStamp3 falls within Range3



TimeStamp1 falls within Range1
TimeStamp1, TimeStamp2 falls within Range2
TimeStamp1, TimeStamp2, TimeStamp3 falls within Range3
import csv 

with open('File1.csv', 'rb') as ranger, open('File2.csv', 'rb') as timeStamp: 

    range_reader = [x for x in csv.reader(ranger, quotechar='"')]
    timeStamp_reader = [x for x in csv.reader(timeStamp, quotechar='"')]
    for range_row in range_reader:
        temp = []
        for timeStamp_row in timeStamp_reader:
            if range_row[1] <= timeStamp_row[1] and range_row[2] >= timeStamp_row[1]:
                temp.append(timeStamp_row[0])
        if temp:
            print " %s falls within %s "% (','.join(temp), range_row[0])

Lukasas ans很好,但是如果您的數據集很大,則每次在for循環中搜索可能不是一個好主意。 只需在開始時復制它們即可。 此外,要根據需要進行輸出,需要在外循環開始時保存它們。

TimeStamp1 falls within Range1
TimeStamp1,TimeStamp2 falls within Range2
TimeStamp1,TimeStamp2,TimeStamp3 falls within Range3

正如您將看到的,從我用Python 3編寫代碼開始,我做了很多改動。您是否在使用Python 2?

無論如何,很高興回答問題。 我認為這基本上可以按照您希望的方式進行:

import csv 
import datetime


with open('File1', 'r') as range, open('File2', 'r') as timeStamp: 

    range_rows = list(csv.reader(range, quotechar='"'))
    timeStamp_rows = list(csv.reader(timeStamp, quotechar='"'))
    range_list = []
    d=datetime.datetime.now()
    for row in range_rows:
        time = [row[0], d.strptime(row[1][:-5],"%Y-%m-%d %H:%M:%S"), d.strptime(row[2][:-5],"%Y-%m-%d %H:%M:%S")]
        range_list.append(time)
    timeStamp_list = []
    for row in timeStamp_rows:
        time = [row[0], d.strptime(row[1][:-5],"%Y-%m-%d %H:%M:%S")]
        timeStamp_list.append(time)
    for i in range_list:
        for e in timeStamp_list:

            if i[1] <= e[1] and i[2] >= e[1]:
                print(" %s falls within %s "% (e[0], i[0]))

輸出:

 TimeStamp1 falls within Range1 
 TimeStamp1 falls within Range2 
 TimeStamp2 falls within Range2 
 TimeStamp1 falls within Range3 
 TimeStamp2 falls within Range3 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM