[英]How to read a CSV which has grouped data where each group has different columns?
I have just started learning Python.我刚开始学习Python。 In the same context, I have got an assignment to parse a CSV and compare with another which is in the same format.
在相同的上下文中,我有一个任务来解析 CSV 并与另一个格式相同的文件进行比较。
CSV can be read as: CSV 可以读作:
"first-report","10/01/2019 at 18:54:55"
"Tags Company","B2 603, Belcastel","MV Street, (near Orbis School - 2)","Pune","Maharashtra","India","1"
"James Kooney","sants_rn","Manager"
"Groups","IPs","Hosts","Hosts Matching Filters","Analysis","Date Range","Network","Tags"
"null","NONE","0","0","scans","N/A","ALL","NONE"
"Total Vulnerabilities","Avg Risk","Business Risk"
"17","2.8","14/100"
"IP","Network","Total Vulnerabilities","Security Risk"
"10.10.10.10","Global Default Network","17","2.8"
by Status
"Status","Confirmed","Potential","Total"
"New","1","3","4"
"Active","0","0","0"
"Re-Opened","0","0","0"
"Total","1","3","4"
"Fixed","0","0","0"
"Changed","1","3","4"
As it is portrayed in sample data, CSV doesnot have fixed columns.如示例数据中所示,CSV 没有固定列。 Data is segregated in different groups.
数据被隔离在不同的组中。 I want to compare the following keys from groups from the aforementioned CSV and print out the differences in a summary file wherever there is a mismatch in key-values.
我想比较上述 CSV 组中的以下键,并在键值不匹配的地方打印出摘要文件中的差异。 Eg Difference found at line 14, Expected "New" found "Active"
例如,在第 14 行发现差异,预期“新”发现“活动”
"Groups","IPs","Hosts","Hosts Matching Filters","Analysis","Date Range","Network","Tags"
"Total Vulnerabilities","Avg Risk","Business Risk"
"IP","Network","Total Vulnerabilities","Security Risk"
"Status","Confirmed","Potential","Total"
Can someone please guide me for the optimum solution.有人可以指导我找到最佳解决方案。
I was struggling with finding different options but no luck so far.我一直在努力寻找不同的选择,但到目前为止还没有运气。 My approach was using CSV.DictReader to compare each key, however, because of the variable column count, I am facing some indexing issues.
我的方法是使用 CSV.DictReader 来比较每个键,但是,由于列数可变,我面临一些索引问题。
Here is the sample code which I have written.这是我编写的示例代码。
summary = open(summary, 'w')
actualcsvdict = csv.DictReader(open(actualoutput), fieldnames=fieldnames)
exxpectedcsvdict = csv.DictReader(open(expectedoutput), fieldnames=fieldnames)
actualcsvrows = list(actualcsvdict)
expectedcsvrows = list(exxpectedcsvdict)
print(len(actualcsvrows))
for line in range(len(actualcsvrows)):
if actualcsvrows[line] != expectedcsvrows[line]:
summary.write(f"\nMismatch found at line number {line + 2}\n")
for key1 in actualcsvrows[line]:
if actualcsvrows[line][key1] != expectedcsvrows[line][key1]:
summary.write(
f"For {key1} column, Expected value was[ {actualcsvrows[line][key1]} ] Found [ {expectedcsvrows[line][key1]} ]\n")
PS fieldnames in this case is在这种情况下,PS 字段名是
"Status","Confirmed","Potential","Total"
For your specific case, you don't need to use the DictReader
class, the normal reader
class is enough.对于您的具体情况,您不需要使用
DictReader
class,普通的reader
class 就足够了。
summary = open(summary, 'w')
actualcsv = csv.reader(open(actualoutput))
exxpectedcsv = csv.reader(open(expectedoutput))
actualrows = list(actualcsv)
expectedrows = list(exxpectedcsv)
for line in range(len(actualrows)):
if actualrows[line] != expectedrows[line]:
summary.write(f"\nMismatch found at line number {line + 2}\n")
for act,exp in zip(actualrows[line], expectedrows[line]):
if act != exp:
summary.write(f"Expected {exp}, got {act}\n")
But to be honest, I think your problem could get solved by the difflib
library, depending on your exact needs.但老实说,我认为
difflib
库可以解决您的问题,具体取决于您的具体需求。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.