简体   繁体   English

将.txt文件处理成字典(Python v2.7)

[英]Process .txt file into dictionary (Python v2.7)

I am currently looking to process and parse out information from this .txt file . 我目前正在寻找处理和解析此.txt文件中的信息 The file appears to be tab delimited. 该文件似乎是制表符分隔的。 I am looking to parse out the base 16 value (ie. 000000) as the dictionary key and the company name (ie. Xerox Corporation) as the dictionary value. 我希望将基数16值(即000000)解析为字典键和公司名称(即Xerox Corporation)作为字典值。 So, if for example I look up in my dictionary the key 000001, Xerox Corporation would be returned as the respective value. 因此,如果我在我的字典中查找密钥000001,则Xerox Corporation将作为相应的值返回。

I've tried parsing the .txt file as a csv reading the entry on every nth line but unfortunately there is no pattern and the nth number varies. 我已经尝试将.txt文件解析为csv读取每个第n行的条目,但不幸的是没有模式,第n个数字不同。

Is there any way to capture the value preceeding the term "base 16" for example and then the term that follows it to make a dictionary entry? 有没有办法捕获术语“基数16”之前的值,然后是后面的术语来制作字典条目?

Many thanks 非常感谢

result = dict()
for lig in open('oui.txt'):
    if 'base 16' in lig:
        num, sep, txt = lig.strip().partition('(base 16)')
        result.[num.strip()] = txt.strip()

Well entries are seperated with two newlines. 井条目分为两个换行符。 The second line always is the base16 one. 第二行总是base16 one。 The data before the first tab is the base16 key and the last is the company name. 第一个选项卡之前的数据是base16键,最后一个是公司名称。

import urllib

inputfile = urllib.urlopen("http://standards.ieee.org/develop/regauth/oui/oui.txt")
data = inputfile.read()

entries = data.split("\n\n")[1:-1] #ignore first and last entries, they're not real entries

d = {}
for entry in entries:
    parts = entry.split("\n")[1].split("\t")
    company_id = parts[0].split()[0]
    company_name = parts[-1]
    d[company_id] = company_name

Some of the results: 一些结果:

40F52E: Leica Microsystems (Schweiz) AG
3831AC: WEG
00B0F0: CALY NETWORKS
9CC077: PrintCounts, LLC
000099: MTX, INC.
000098: CROSSCOMM CORPORATION
000095: SONY TEKTRONIX CORP.
000094: ASANTE TECHNOLOGIES
000097: EMC Corporation
000096: MARCONI ELECTRONICS LTD.
000091: ANRITSU CORPORATION
000090: MICROCOM
000093: PROTEON INC.
000092: COGENT DATA TECHNOLOGIES
002192: Baoding Galaxy Electronic Technology  Co.,Ltd
90004E: Hon Hai Precision Ind. Co.,Ltd.
002193: Videofon MV
00A0D4: RADIOLAN,  INC.
E0F379: Vaddio
002190: Goliath Solutions
def oui_parse(fn='oui.txt'):
    with open(fn) as ouif:
        content = ouif.read()
    for block in content.split('\n\n'):
        lines = block.split('\n')

        if not lines or not '(hex)' in lines[0]: # First block
            continue

        assert '(base 16)' in lines[1]
        d = {}
            d['oui'] = lines[1].split()[0]
        d['company'] = lines[1].split('\t')[-1]
        if len(lines) == 6:
            d['division'] = lines[2].strip()
        d['street'] = lines[-3].strip()
        d['city'] = lines[-2].strip()
        d['country'] = lines[-1].strip()
        yield d

oui_info = list(oui_parse())
>>> import urllib
... 
... f = urllib.urlopen('http://standards.ieee.org/develop/regauth/oui/oui.txt')
... d = dict([(s[:6], s[22:].strip()) for s in f if 'base 16' in s])
... print d['000001']
XEROX CORPORATION

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM