简体   繁体   English

Python动态字典,每个键计数多值

[英]Python Dynamic Dictionary, Multi-Value Per Key Counting

I am writing a python script that parses 1000's of Rancid files that contain all the configuration information, model, software type, software version, and so forth for a bunch of routers on a network. 我正在编写一个python脚本,用于解析1000个Rancid文件,这些文件包含网络上一堆路由器的所有配置信息,型号,软件类型,软件版本等。 So far I've gotten it to properly loop through all the files and print me out a nice list router by router of the hostname, software type (IE IOS, IOS XR, JUNOSE, JUNOS etc.. ), and what the software version is (IE 12.3R7, 15.2(2)T1, 12.1.1p0.1, etc..). 到目前为止,我已经正确地遍历了所有文件,并按路由器列出了主机名,软件类型(IE IOS,IOS XR,JUNOSE,JUNOS等。)以及什么软件版本的路由器列表。是(IE 12.3R7、15.2(2)T1、12.1.1p0.1等。)。

The problem is, because of the fact that at any point the networking team could add a new router model, upgrade the software to an unknown version, or whatever, managing it by adding variables for every single router type, software version and so on would require constant maintenance on script which I would prefer not to do, so I made all the variables dynamic. 问题在于,由于网络团队在任何时候都可以添加新的路由器模型,将软件升级到未知版本等,因此,通过为每个路由器类型,软件版本等添加变量来进行管理即可。需要对脚本进行不间断的维护,而我不希望这样做,所以我使所有变量都是动态的。 It loops through the files, finds the proper software type, version, and model (because each vendor normally doesn't change how that is presented from version to version). 它遍历文件,找到正确的软件类型,版本和模型(因为每个供应商通常不会更改版本之间的显示方式)。 Then assigns the variable as 'model', 'type', and 'version' then prints it as it goes. 然后将变量分配为“模型”,“类型”和“版本”,然后将其打印出来。

I want to be able to add an argparse to my code so that when needed rather than an entire list being printed I can get just aa summary with a count so during each pass of the loop i have it add the values it finds to a multivalue per key dictionary. 我希望能够在我的代码中添加一个argparse,以便在需要时而不是在打印整个列表时,我只能得到一个带有计数的摘要,因此在循环的每次通过期间,我都会将其找到的值添加到多值中每个关键字典。

Here is how the dictionary is built, and how it gets printed out. 这是字典的构建方式以及如何打印出来。

I set the key to the filename which is based off the hostname (so less parsing the raw data for more information) 我将密钥设置为基于主机名的文件名(因此,较少解析原始数据以获取更多信息)

key = file
mydict.setdefault(key, [])
mydict[key].append(model)
mydict[key].append(type)
mydict[key].append(version)

#here is an example of what the dictionary looks like
print mydict

{'router1': ['model1', 'JUNOS', '12.3R7'], 'router2': ['model1', 'JUNOS', 
'13.3R4'], 'router3': ['model2', 'IOS', '15.2'], 'router4': ['model3', 
'JUNOS', '11.4R1'], 'router5': ['model2', 'IOS', '15.3'], 'router6': 
['model4', 'JUNOSe', '12.1.1p0.1'], 'router7': ['model1', 'JUNOS', 
'12.3R7'], 'router7': ['model1', 'JUNOS', '12.3R7'], 'router8': ['model1', 
'JUNOS', '13.3R4'], 'router9': ['model2', 'IOS', '15.2'], 'router10': 
['model3', 'JUNOS', '11.4R1'], 'router11': ['model2', 'IOS', '15.3'], 
'router12': ['model5', 'JUNOS', '12.3R7']}

What I would like is a way to match all the duplicates where all 3 values are the same, count them, and then print them out in a nicely formated list like this (ignoring the key, because printing it doesn't matter for this exercise) 我想要的是一种方法来匹配所有三个值都相同的所有重复项,对它们进行计数,然后将它们打印成这样的格式良好的列表(忽略键,因为打印对于本练习并不重要)

JUNOS    model1 12.3R7 3
JUNOS    model1 13.3R4 2
JUNOS    model3 11.4R1 2
JUNOS    model5 12.3R7 1
IOS      model2 15.2 2
IOS      model2 15.3 2
JUNOSE   model4 12.1.1p0.1 1

Or even more preferable (but probably much more difficult) would be 甚至更可取(但可能要困难得多)是

JUNOS

model1   12.3R7 2
         13.3R4 2
model3   11.4R1 2
model5   12.3R7 1

JUNOSE

model2   12.1.1p0.1 1

IOS

model2   15.2 2
         15.3 2

Maybe Pandas library can help you to achieve this : https://pandas.pydata.org/index.html 也许Pandas库可以帮助您实现这一目标: https : //pandas.pydata.org/index.html
You can convert mydict into a Pandas' DataFrame , then use the groupby() method to get all the groups. 您可以将mydict转换为Pandas的DataFrame ,然后使用groupby()方法获取所有组。 Finally use size() to count. 最后使用size()进行计数。

You can do: 你可以做:

import pandas as pd
mydict = {} #your dict here
df = pd.DataFrame.from_dict(mydict,'index')
df.columns=['model','type','version'] #Affect column names to your DF.
print(df.groupby(['type','model','version']).size())

That gives you: 那给你:

type    model   version   
IOS     model2  15.2          2
                15.3          2
JUNOS   model1  12.3R7        2
                13.3R4        2
        model3  11.4R1        2
        model5  12.3R7        1
JUNOSe  model4  12.1.1p0.1    1

When you are affecting columns names with df.columns=... make sure that it fits to your dict values. 当您使用df.columns=...影响列名时, df.columns=...确保它适合您的dict值。 You must have as many columns as the length of your values list. 您的列数必须与值列表的长度一样多。

Other example of groupby then size can be found here : Duplicate rows in pandas DF groupby然后大小的另一个示例可以在这里找到: pandas DF中的重复行

Edit - Dict structuration 编辑-词典结构
In my opinion it should be better with a more descriptive dict using dict as values instead of a list, like: 在我看来,最好使用更具描述性的dict而不是列表,将dict作为值而不是列表,这样会更好:

{'router1': {'bar': None,
  'foo': None,
  'model': 'model1',
  'type': 'JUNOS',
  'version': '12.3R7'},
 'router2': {'bar': None,
  'foo': None,
  'model': 'model1',
  'type': 'JUNOS',
  'version': '13.3R4'},...}

This way pd.DataFrame.form_dict will automatically affect the columns names. 这样,pd.DataFrame.form_dict将自动影响列名称。 The previous code will be: 先前的代码将是:

import pandas as pd
mydict = {} #your dict of dicts here
df = pd.DataFrame.from_dict(mydict,'index')
print(df.groupby(['type','model','version']).size())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM