[英]How to compare two csv files in Python
我有两个csv文件。 一个称为“ Standard reg.csv”,另一个称为“ Driver Details.csv”
在“标准reg.csv”中,前两行是:
['Day', 'Month', 'Year', 'Reg Plate', 'Hour', 'Minute', 'Second', 'Speed over limit']
['1', '1', '2016', 'NU16REG', '1', '1', '1', '5816.1667859699355']
Driver Details.csv中的前两行是:
['FirstName', 'LastName', 'StreetAddress', 'City', 'Region', 'Country', 'PostCode', 'Registration']
['Violet', 'Kirby', '585-4073 Convallis Street', 'Balfour', 'Orkney', 'United Kingdom', 'OC1X 6QE', 'NU16REG']
我的代码是这样的:
import csv
file_1 = csv.reader(open('Standard Reg.csv', 'r'), delimiter=',')
file_2 = csv.reader(open('Driver Details.csv', 'r'), delimiter=',')
for row in file_1:
reg = row[3]
avgspeed = row[7]
for row in file_2:
firstname = row[0]
lastname = row[1]
address = row[2]
city = row[3]
region = row[4]
reg2 = row[7]
if reg == reg2:
print('Match found')
else:
print('No match found')
这是一个进行中的工作,但我似乎无法获得比最后一行更多的代码来进行比较。
在此行之后带有print(reg)
: reg2 = row[7]
它表明它已经阅读了整个专栏文章。 当我在以下位置执行print(reg2)
时,也会打印整列: reg2 = row[7]
但是在if reg == reg2:
它仅读取两列的最后if reg == reg2:
行并将它们进行比较,我不确定如何解决此问题。
先感谢您。
if reg == reg2
的测试条件出现在两个循环之外(对于file_1和file_2)。 这就是为什么只对每个文件的最后一行进行测试的原因。
另一个问题是在两个for
循环中使用相同的循环变量row
。
我建议您首先使用注册号作为密钥,将Driver Details.csv
所有详细信息加载到词典中。 这样,您就可以轻松查找给定的条目,而不必继续从文件中读取所有行:
import csv
driver_details = {}
with open('Driver Details.csv') as f_driver_details:
csv_driver_details = csv.reader(f_driver_details)
header = next(csv_driver_details) # skip the header
for row in csv_driver_details:
driver_details[row[7]] = row
with open('Standard Reg.csv') as f_standard_reg:
csv_standard_reg = csv.reader(f_standard_reg)
header = next(csv_standard_reg) # skip the header
for row in csv_standard_reg:
try:
driver = driver_details[row[3]]
print('Match found - {} {}'.format(driver[0], driver[1]))
except KeyError as e:
print('No match found')
您拥有的代码将遍历file_2
,并将文件指针保留在末尾(如果找不到匹配项)或匹配项的位置(可能早于下一个条目的匹配项丢失)。 对于您的工作方式,您必须从每个循环的开头开始读取文件,这会非常慢。
要添加输出csv
并显示完整地址,您可以执行以下操作:
import csv
speed = 74.3
fine = 35
driver_details = {}
with open('Driver Details.csv') as f_driver_details:
csv_driver_details = csv.reader(f_driver_details)
header = next(csv_driver_details) # skip the header
for row in csv_driver_details:
driver_details[row[7]] = row
with open('Standard Reg.csv') as f_standard_reg, open('Output log.csv', 'w', newline='') as f_output:
csv_standard_reg = csv.reader(f_standard_reg)
header = next(csv_standard_reg) # skip the header
csv_output = csv.writer(f_output)
for row in csv_standard_reg:
try:
driver = driver_details[row[3]]
print('Match found - Fine {}, Speed {}\n{} {}\n{}'.format(fine, speed, driver[0], driver[1], '\n'.join(driver[2:7])))
csv_output.writerow(driver[0:7] + [speed, fine])
except KeyError as e:
print('No match found')
这将打印以下内容:
Match found - Fine 35, Speed 74.3
Violet Kirby
585-4073 Convallis Street
Balfour
Orkney
United Kingdom
OC1X 6QE
并生成包含以下内容的输出文件:
Violet,Kirby,585-4073 Convallis Street,Balfour,Orkney,United Kingdom,OC1X 6QE,74.3,35
尝试使用csv.DictReader
消除大部分代码行:
import csv
Violations = defaultdict(list)
# Read in the violations, there are probably less violations than drivers (I hope!)
with open('Standard reg.csv') as violations:
for v in csv.DictReader(violations):
Violations[v['Reg Plate']] = v
with open('Driver Details.csv') as drivers:
for d in csv.DictReader(drivers):
fullname = "{driver.FirstName} {driver.LastName}".format(driver=d)
if d['Registration'] in Violations:
count = len(Violations[d['Registration']])
print("{fullname} has {count} violations.".format(fullname=fullname, count=count))
else:
print("{fullname} is too fast to catch!".format(fullname=fullname))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.