繁体   English   中英

如何根据日期过滤 csv 文件中的行?

[英]How to filter rows inside a csv file based on their date?

我有一个名为aa_20200907.txt的文件,它看起来像这样:

#DATA:DD,CARS_INTERNATIONAL:VERSION01.1
2020-09-07T00:00:00.285+02:00,New-York,XX,Audi
2020-09-07T00:01:00.385+02:00,London,100,Mercedes
2020-09-07T00:02:00.255+02:00,New-York,90,Ford
2020-09-07T00:03:00.523+02:00,New-York,91,BMW
2020-09-08T00:00:58.444+02:00,New-York,12,BMW
2020-09-08T00:01:55.336+02:00,New-York,11,Mercedes

我有一个代码,它根据 2 个条件过滤行。

  1. Condition_1:我只想要index[2]是数字的行。
  2. Condition_2 :如果index[1] (日期)与已处理文件名中提到的日期相同,我只想要这些行。 文件名的日期分配给名为missing_dates的列表。

现在下面的代码在condition_1上完美运行,问题是condition_2没有按照我想要的方式工作。 请注意,我通常在多个文件上运行此代码,这意味着missing_dates包含更多值。

这是我的代码:

import csv
import datetime 
from pathlib import Path

root=Path(r'c:\data\PPE\Desktop\test_folder')

def filter_row(r, date):  
    condition_1 = r[2].isdigit()  #<-- select only the rows if index 2 is numbers. 
    condition_2 = date != missing_date #<-- select only the rows of that specific day.
    
    return condition_1 and condition_2

missing_dates = ['20200907']

output_list = []
for missing_date in missing_dates:
    # print(f"processing {missing_date}")
    files=[fn for fn in (e for e in root.glob(f"**/*_{missing_date}.txt") if e.is_file())]
    for file in files:      
        with open(file, 'r') as log_file:
            reader = csv.reader(log_file, delimiter = ',')
            next(reader) # skip header
            for row in reader:
                if filter_row(row, missing_date):
                    output_list.append(row)
                    
print(output_list) 

这是我目前的 output:

[]

这是所需的 output:

['2020-09-07T00:01:00.385+02:00', 'London', '100', 'Mercedes']
['2020-09-07T00:02:00.255+02:00', 'New-York', '90', 'Ford']
['2020-09-07T00:03:00.523+02:00', 'New-York', '91', 'BMW']

*请注意,我不想编写全新的代码。 我只想修复condition_2并保留当前代码,因为我觉得它很舒服。

给你 go 哥们:

输入:


#DATA:DD,CARS_INTERNATIONAL:VERSION01.1
2020-09-07T00:00:00.285+02:00,New-York,XX,Audi
2020-09-07T00:01:00.385+02:00,London,100,Mercedes
2020-09-07T00:02:00.255+02:00,New-York,90,Ford
2020-09-07T00:03:00.523+02:00,New-York,91,BMW
2020-09-08T00:00:58.444+02:00,New-York,12,BMW
2020-09-08T00:01:55.336+02:00,New-York,11,Mercedes

代码:

import csv
import datetime 
from pathlib import Path
import os

os.chdir('/home/chandanmalla/Desktop/')

def filter_row(r, date):  
    condition_1 = r[2].isdigit()  #<-- select only the rows if index 2 is numbers. 
    condition_2 = r[0].split('T')[0] == date #<-- select only the rows of that specific day.
    return condition_1 and condition_2

missing_dates = ['2020-09-07']
file_end_name = ['20200907']

output_list = []


files=[]
for f in os.listdir():
    for m_d in file_end_name:
        if f.endswith(m_d +'.txt'):
            files.append(f)
for file,m_d in zip(files,missing_dates):
    with open(file, 'r') as log_file:
        reader = csv.reader(log_file, delimiter = ',')
        next(reader) # skip header
        for row in reader:
            if filter_row(row, m_d):
                output_list.append(row)
                
print(output_list) 


Output


[['2020-09-07T00:01:00.385+02:00', 'London', '100', 'Mercedes'],
 ['2020-09-07T00:02:00.255+02:00', 'New-York', '90', 'Ford'],
 ['2020-09-07T00:03:00.523+02:00', 'New-York', '91', 'BMW']]

您的代码在 condition_2 和下面的代码行中存在问题,当在一段代码下方运行时存在零文件。

    files=[fn for fn in (e for e in root.glob(f"**/*_{missing_date}.txt") if e.is_file())]


这是一种方法

前任:

import os
import re
import csv
import datetime

with open(filePath) as infile:
    filename = os.path.basename(filePath)
    print(filename)  # -> aa_20200907.txt
    dateToCheck = datetime.datetime.strptime(re.sub(r"[^\d]", "", filename), "%Y%m%d").strftime("%Y-%m-%d")  # Get Date
    reader = csv.reader(infile)
    for line in reader:
        if line[0].startswith(dateToCheck) and  line[2].isdigit(): # Conditions
            print(line)

Output:

['2020-09-07T00:01:00.385+02:00', 'London', '100', 'Mercedes']
['2020-09-07T00:02:00.255+02:00', 'New-York', '90', 'Ford']
['2020-09-07T00:03:00.523+02:00', 'New-York', '91', 'BMW']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM