![](/img/trans.png)
[英]How to extract/filter rows of a csv file based on some particular keywords?
[英]How to filter rows inside a csv file based on their date?
我有一个名为aa_20200907.txt
的文件,它看起来像这样:
#DATA:DD,CARS_INTERNATIONAL:VERSION01.1
2020-09-07T00:00:00.285+02:00,New-York,XX,Audi
2020-09-07T00:01:00.385+02:00,London,100,Mercedes
2020-09-07T00:02:00.255+02:00,New-York,90,Ford
2020-09-07T00:03:00.523+02:00,New-York,91,BMW
2020-09-08T00:00:58.444+02:00,New-York,12,BMW
2020-09-08T00:01:55.336+02:00,New-York,11,Mercedes
我有一个代码,它根据 2 个条件过滤行。
Condition_1:
我只想要index[2]
是数字的行。Condition_2
:如果index[1]
(日期)与已处理文件名中提到的日期相同,我只想要这些行。 文件名的日期分配给名为missing_dates
的列表。 现在下面的代码在condition_1
上完美运行,问题是condition_2
没有按照我想要的方式工作。 请注意,我通常在多个文件上运行此代码,这意味着missing_dates
包含更多值。
这是我的代码:
import csv
import datetime
from pathlib import Path
root=Path(r'c:\data\PPE\Desktop\test_folder')
def filter_row(r, date):
condition_1 = r[2].isdigit() #<-- select only the rows if index 2 is numbers.
condition_2 = date != missing_date #<-- select only the rows of that specific day.
return condition_1 and condition_2
missing_dates = ['20200907']
output_list = []
for missing_date in missing_dates:
# print(f"processing {missing_date}")
files=[fn for fn in (e for e in root.glob(f"**/*_{missing_date}.txt") if e.is_file())]
for file in files:
with open(file, 'r') as log_file:
reader = csv.reader(log_file, delimiter = ',')
next(reader) # skip header
for row in reader:
if filter_row(row, missing_date):
output_list.append(row)
print(output_list)
这是我目前的 output:
[]
这是所需的 output:
['2020-09-07T00:01:00.385+02:00', 'London', '100', 'Mercedes']
['2020-09-07T00:02:00.255+02:00', 'New-York', '90', 'Ford']
['2020-09-07T00:03:00.523+02:00', 'New-York', '91', 'BMW']
*请注意,我不想编写全新的代码。 我只想修复condition_2
并保留当前代码,因为我觉得它很舒服。
给你 go 哥们:
输入:
#DATA:DD,CARS_INTERNATIONAL:VERSION01.1
2020-09-07T00:00:00.285+02:00,New-York,XX,Audi
2020-09-07T00:01:00.385+02:00,London,100,Mercedes
2020-09-07T00:02:00.255+02:00,New-York,90,Ford
2020-09-07T00:03:00.523+02:00,New-York,91,BMW
2020-09-08T00:00:58.444+02:00,New-York,12,BMW
2020-09-08T00:01:55.336+02:00,New-York,11,Mercedes
代码:
import csv
import datetime
from pathlib import Path
import os
os.chdir('/home/chandanmalla/Desktop/')
def filter_row(r, date):
condition_1 = r[2].isdigit() #<-- select only the rows if index 2 is numbers.
condition_2 = r[0].split('T')[0] == date #<-- select only the rows of that specific day.
return condition_1 and condition_2
missing_dates = ['2020-09-07']
file_end_name = ['20200907']
output_list = []
files=[]
for f in os.listdir():
for m_d in file_end_name:
if f.endswith(m_d +'.txt'):
files.append(f)
for file,m_d in zip(files,missing_dates):
with open(file, 'r') as log_file:
reader = csv.reader(log_file, delimiter = ',')
next(reader) # skip header
for row in reader:
if filter_row(row, m_d):
output_list.append(row)
print(output_list)
Output
[['2020-09-07T00:01:00.385+02:00', 'London', '100', 'Mercedes'],
['2020-09-07T00:02:00.255+02:00', 'New-York', '90', 'Ford'],
['2020-09-07T00:03:00.523+02:00', 'New-York', '91', 'BMW']]
您的代码在 condition_2 和下面的代码行中存在问题,当在一段代码下方运行时存在零文件。
files=[fn for fn in (e for e in root.glob(f"**/*_{missing_date}.txt") if e.is_file())]
这是一种方法
前任:
import os
import re
import csv
import datetime
with open(filePath) as infile:
filename = os.path.basename(filePath)
print(filename) # -> aa_20200907.txt
dateToCheck = datetime.datetime.strptime(re.sub(r"[^\d]", "", filename), "%Y%m%d").strftime("%Y-%m-%d") # Get Date
reader = csv.reader(infile)
for line in reader:
if line[0].startswith(dateToCheck) and line[2].isdigit(): # Conditions
print(line)
Output:
['2020-09-07T00:01:00.385+02:00', 'London', '100', 'Mercedes']
['2020-09-07T00:02:00.255+02:00', 'New-York', '90', 'Ford']
['2020-09-07T00:03:00.523+02:00', 'New-York', '91', 'BMW']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.