使用Python解析CSV

Question

I have the following csv file that has three fields Vulnerability Title, Vulnerability Severity Level , Asset IP Address which shows vulnerabilities name , level of vulnerability and IP address that is having that vulnerability. 我有以下csv文件，其中包含三个字段：漏洞标题，漏洞严重性级别，资产IP地址，其中显示了漏洞名称，漏洞级别和具有该漏洞的IP地址。 I am trying to print a report that would list vulnerability in a column severity next to it and last column list of IP address having that vulnerability. 我正在尝试打印一份报告，该报告将在其旁边的严重性列中列出漏洞，并列出具有该漏洞的IP地址的最后一列。

Vulnerability Title Vulnerability Severity Level    Asset IP Address
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.65.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.65.164
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.50.82
TLS/SSL Server Supports Weak Cipher Algorithms  6   10.103.65.164
Weak Cryptographic Key  3   10.103.64.10
Unencrypted Telnet Service Available    4   10.10.30.81
Unencrypted Telnet Service Available    4   10.10.50.82
TLS/SSL Server Supports Anonymous Cipher Suites with no Key Authentication  6   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.100
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.10.30.81

and I would like to recreate a csv file that uses Vulnerability Title tab as the key and creates a second tab called Vulnerability Severity Level and last tab would contain all the ip addresses that has the vulnerabilities 并且我想重新创建一个使用“漏洞标题”选项卡作为键的csv文件，并创建另一个名为“漏洞严重性级别”的选项卡，最后一个选项卡将包含具有该漏洞的所有ip地址

import csv
from pprint import pprint
from collections import defaultdict
import glob
x= glob.glob("/root/*.csv")

d = defaultdict()
n = defaultdict()
for items in x:
        with open(items, 'rb') as f:
                reader = csv.DictReader(f, delimiter=',')
                for row in reader:
                        a = row["Vulnerability Title"]
                        b = row["Vulnerability Severity Level"], row["Asset IP Address"]
                        c = row["Asset IP Address"]
        #               d = row["Vulnerability Proof"]
                        d.setdefault(a, []).append(b)
        f.close()
pprint(d)
with open('results/ipaddress.csv', 'wb') as csv_file:
        writer = csv.writer(csv_file)
        for key, value in d.items():
                for x,y in value:
                        n.setdefault(y, []).append(x)
#                       print x
                        writer.writerow([key,n])

with open('results/ipaddress2.csv', 'wb') as csv2_file:
        writer = csv.writer(csv2_file)
        for key, value in d.items():
             n.setdefault(value, []).append(key)
             writer.writerow([key,n])

Since I cant explain very well. 由于我无法很好地解释。 let me try to simplify 让我尝试简化

lets say I have the following csv 可以说我有以下csv

Car model   owner
Honda   Blue    James
Toyota  Blue    Tom
Chevy   Green   James
Chevy   Green   Tom

I am trying to create this csv as the following: 我正在尝试如下创建此csv：

Car model   owner
Honda   Blue    James
Toyota  Blue    Tom
Chevy   Green   James,Tom

both of the solutions are correct. 两种解决方案都是正确的。 here is my final script as well 这也是我的最终剧本

import csv
import pandas as pd

df = pd.read_csv('test.csv', names=['Vulnerability Title', 'Vulnerability Severity Level','Asset IP Address'])
#print df
grouped = df.groupby(['Vulnerability Title','Vulnerability Severity Level'])

groups = grouped.groups
#print groups
new_data = [k + (v['Asset IP Address'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['Vulnerability Title' ,'Vulnerability Severity Level', 'Asset IP Address'])

print new_df
new_df.to_csv('final.csv')

thank you 谢谢

Answer 1

Answer considering your car example. 回答考虑您的汽车示例。 Essentially, I am creating a dictionary which has the car brand as the key, and a two element tuple. 本质上，我正在创建一个字典，其中以汽车品牌为关键，并包含两个元素的元组。 The first element of the tuple is the color and the second, a list of owners.): 元组的第一个元素是颜色，第二个是所有者列表。）：

import csv

car_dict = {}
with open('<file_to_read>', 'rb') as fi:
    reader = csv.reader(fi)
    for f in reader:
        if f[0] in car_dict:
            car_dict[f[0]][1].append(f[2]) 
        else:
            car_dict[f[0]] = (f[1], [f[2]])

with open('<file_to_write>', 'wb') as ou:
    for k in car_dict:
        out_string ='{}\t{}\t{}\n'.format(k, car_dict[k][0], ','.join(car_dict[k][1]))
        ou.write(out_string)

Answer 2

When manipulate structured date, especially large data set. 当操纵结构化日期时，尤其是大数据集时。 I would like to suggest you to use pandas . 我建议你用熊猫。

For your problem, I will give you an example of pandas groupby feature as solution. 对于您的问题，我将为您提供一个熊猫groupby功能示例作为解决方案。 Suppose you have the data: 假设您有数据：

data = [['vt1', 3, '10.0.0.1'], ['vt1', 3, '10.0.0.2'], 
        ['vt2', 4, '10.0.10.10']]

The pandas to operate date is very fensy: 大熊猫的手术日期很古怪：

import pandas as pd

df = pd.DataFrame(data=data, columns=['title', 'level', 'ip'])
grouped = df.groupby(['title', 'level'])

Then 然后

groups = grouped.groups

will be a dict that is almost you need. 几乎是您所需要的命令。

print(groups)
{('vt1', 3): [0, 1], ('vt2', 4): [2]}

[0,1] represents the row label. [0,1]代表行标签。 Actually you can iterate on these groups to apply any operation you want. 实际上，您可以遍历这些组以应用所需的任何操作。 For example, If you want to save them into csv file: 例如，如果要将它们保存到csv文件中：

new_data = [k + (v['ip'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['title', 'level', 'ips'])

Let's see what is new_df now: 让我们看看什么是new_df：

  title  level                   ips
0   vt1      3  [10.0.0.1, 10.0.0.2]
1   vt2      4          [10.0.10.10]

That's what you need. 那就是你所需要的。 And finally, save to file: 最后，保存到文件：

new_df.to_csv(filename)

I strongly suggest that you should learn pandas data manipulation. 我强烈建议您应该学习熊猫数据操作。 You may find that was much easier and cleaner. 您可能会发现这更加容易和清洁。

使用Python解析CSV

问题描述

2 个解决方案

解决方案1
1 2016-09-21 21:03:22

解决方案2
1 已采纳 2016-09-21 21:12:22

使用Python解析CSV

问题描述

2 个解决方案

解决方案1 1 2016-09-21 21:03:22

解决方案2 1 已采纳 2016-09-21 21:12:22

解决方案1
1 2016-09-21 21:03:22

解决方案2
1 已采纳 2016-09-21 21:12:22