简体   繁体   English

如何读取具有分组数据且每组具有不同列的 CSV?

[英]How to read a CSV which has grouped data where each group has different columns?

I have just started learning Python.我刚开始学习Python。 In the same context, I have got an assignment to parse a CSV and compare with another which is in the same format.在相同的上下文中,我有一个任务来解析 CSV 并与另一个格式相同的文件进行比较。

CSV can be read as: CSV 可以读作:

"first-report","10/01/2019 at  18:54:55"
"Tags Company","B2 603, Belcastel","MV Street, (near Orbis School - 2)","Pune","Maharashtra","India","1"
"James Kooney","sants_rn","Manager"
"Groups","IPs","Hosts","Hosts Matching Filters","Analysis","Date Range","Network","Tags"
"null","NONE","0","0","scans","N/A","ALL","NONE"

"Total Vulnerabilities","Avg Risk","Business Risk"
"17","2.8","14/100"

"IP","Network","Total Vulnerabilities","Security Risk"
"10.10.10.10","Global Default Network","17","2.8"

by Status
"Status","Confirmed","Potential","Total"
"New","1","3","4"
"Active","0","0","0"
"Re-Opened","0","0","0"
"Total","1","3","4"
"Fixed","0","0","0"
"Changed","1","3","4"

As it is portrayed in sample data, CSV doesnot have fixed columns.如示例数据中所示,CSV 没有固定列。 Data is segregated in different groups.数据被隔离在不同的组中。 I want to compare the following keys from groups from the aforementioned CSV and print out the differences in a summary file wherever there is a mismatch in key-values.我想比较上述 CSV 组中的以下键,并在键值不匹配的地方打印出摘要文件中的差异。 Eg Difference found at line 14, Expected "New" found "Active"例如,在第 14 行发现差异,预期“新”发现“活动”

"Groups","IPs","Hosts","Hosts Matching Filters","Analysis","Date Range","Network","Tags"
"Total Vulnerabilities","Avg Risk","Business Risk"
"IP","Network","Total Vulnerabilities","Security Risk"
"Status","Confirmed","Potential","Total"

Can someone please guide me for the optimum solution.有人可以指导我找到最佳解决方案。

I was struggling with finding different options but no luck so far.我一直在努力寻找不同的选择,但到目前为止还没有运气。 My approach was using CSV.DictReader to compare each key, however, because of the variable column count, I am facing some indexing issues.我的方法是使用 CSV.DictReader 来比较每个键,但是,由于列数可变,我面临一些索引问题。

Here is the sample code which I have written.这是我编写的示例代码。

    summary = open(summary, 'w')
    actualcsvdict = csv.DictReader(open(actualoutput), fieldnames=fieldnames)
    exxpectedcsvdict = csv.DictReader(open(expectedoutput), fieldnames=fieldnames)

    actualcsvrows = list(actualcsvdict)
    expectedcsvrows = list(exxpectedcsvdict)
    print(len(actualcsvrows))
    for line in range(len(actualcsvrows)):
        if actualcsvrows[line] != expectedcsvrows[line]:
            summary.write(f"\nMismatch found at line number {line + 2}\n")
            for key1 in actualcsvrows[line]:
                if actualcsvrows[line][key1] != expectedcsvrows[line][key1]:
                    summary.write(
                        f"For {key1} column, Expected value was[ {actualcsvrows[line][key1]} ] Found [ {expectedcsvrows[line][key1]} ]\n")

PS fieldnames in this case is在这种情况下,PS 字段名是

"Status","Confirmed","Potential","Total"

For your specific case, you don't need to use the DictReader class, the normal reader class is enough.对于您的具体情况,您不需要使用DictReader class,普通的reader class 就足够了。

summary = open(summary, 'w')
actualcsv = csv.reader(open(actualoutput))
exxpectedcsv = csv.reader(open(expectedoutput))

actualrows = list(actualcsv)
expectedrows = list(exxpectedcsv)
for line in range(len(actualrows)):
    if actualrows[line] != expectedrows[line]:
        summary.write(f"\nMismatch found at line number {line + 2}\n")
        for act,exp in zip(actualrows[line], expectedrows[line]):
            if act != exp:
                summary.write(f"Expected {exp}, got {act}\n")

But to be honest, I think your problem could get solved by the difflib library, depending on your exact needs.但老实说,我认为difflib库可以解决您的问题,具体取决于您的具体需求。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将字典保存到每个键都有不同行的 csv - How to save a dictionary to a csv where each key has a different row 如何合并 CSV 文件但每个 csv 中的数据有不同的 position? - How to merge CSV files but the data in each csv has different position? 当每个csv具有多个公用列时,如何最好地将多个csv读取到单个数据帧中 - How best to read multiple csvs into a single dataframe when each csv has multiple common columns 如何在具有列表列值列表且某些行具有双引号作为字符串的csv文件中读取 - How to read in a csv file which has list of list columns values with certain rows having double quotes as strings 如何将参数值列表添加到一组数据框,其中该参数对于每个数据框都有不同的值? - How do I add a list of values for a parameter to a group of dataframes, where that parameter has a different value for each dataframe? 当前 5 行有时超过 1 列时,如何读取 CSV 文件的不同部分? - How do I read different sections of a CSV file when the first 5 lines sometimes has more than 1 columns? 如何根据每个组具有 n 行数的特定列在 pandas 中分组? 如果可能,还要从原始 dataframe 中删除? - How to group by in pandas based on specific columns where each group has n number of rows? Also delete from the original dataframe IF POSSIBLE? 如何通过 python 将列表数据添加到 CSV 文件的第一列,该文件有 256 列文件? - How to add the List data to the first column of the CSV file, which has 256 columns file via python? 我的数据在列的值中有逗号,这也是一个分隔符,如何通过 csv.reader 在 python 中读取它 - My data has comma in the value of the column which is also a delimiter, how to read it by csv.reader in python 如何从不同的子目录读取多个 csv 文件并找到具有该值的 csv 文件? - How can I read multiple csv file from different sub directories and find the csv file which has the value?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM