簡體   English   中英

如何將分號分隔的文件轉換為嵌套字典?

[英]How to convert semicolon delimited file to nested dict?

我正在嘗試將以分號分隔的文件轉換為嵌套字典。 今天早上做了很多工作,並猜測我忽略了一些簡單的事情:

輸入(樣本)

實際上,這大約是200行。 只是一個小樣本。

key;name;desc;category;type;action;range;duration;skill;strain_mod;apt_bonus
ambiencesense;Ambience Sense;This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.;psi-chi;passive;automatic;self;constant;;0;
cogboost;Cognitive Boost;The async can temporarily elevate their cognitive performance.;psi-chi;active;quick;self;temp;;-1;{'COG': 5}

電流輸出

[['key',
  'name',
  'desc',
  'category',
  'type',
  'action',
  'range',
  'duration',
  'skill',
  'strain_mod',
  'apt_bonus'],
 ['ambiencesense',
  'Ambience Sense',
  'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
  'psi-chi',
  'passive',
  'automatic',
  'self',
  'constant',
  '',
  '0',
  ''],
 ['cogboost',
  'Cognitive Boost',
  'The async can temporarily elevate their cognitive performance.',
  'psi-chi',
  'active',
  'quick',
  'self',
  'temp',
  '',
  '-1',
  "{'COG': 5}"]]

期望的輸出

blahblah = {
     'ambiencesense': {
         'name': 'Ambiance Sense'
         'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
         'category': 'psi-chi',
         'type': 'passive',
         'action': 'automatic',
         'range': 'self',
         'duration': 'constant',
         'skill': '',
         'strain_mod': '0',
         'apt_bonus': '',
         },     
     'cogboost': {
         'name': 'Cognitive Boost'
         'desc': 'The async can temporarily elevate their cognitive performance.',
         'category': 'psi-chi',
         'type': 'active',
         'action': 'quick',
         'range': 'self',
         'duration': 'temp',
         'skill': '',
         'strain_mod': '-1',
         'apt_bonus': 'COG', 5',
         },
         ...

腳本(非功能性)

#!/usr/bin/env python
# Usage: ./csvdict.py <filename to convert to dict> <file to output>

import csv
import sys
import pprint

def parse(filename):
    with open(filename, 'rb') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(), delimiters=';')
        csvfile.seek(0)
        reader = csv.reader(csvfile, dialect)
        dict_list = []

        for line in reader:
            dict_list.append(line)
        return dict_list

        new_dict = {}

        for item in dict_list:
            key = item.pop('key')
            new_dict[key] = item

output = parse(sys.argv[1])

with open(sys.argv[2], 'wt') as out:
    pprint.pprint(output, stream=out)

工作腳本

#!/usr/bin/env python
# Usage: ./csvdict.py <input filename> <output filename>

import sys
import pprint

file_name = sys.argv[1]
data = {}
error = 'Incorrect number of arguments.\nUsage: ./csvdict.py <input filename> <output filename>'

if len(sys.argv) != 3:
    print(error)
else:

    with open(file_name, 'r') as test_fh:
        header_line = next(test_fh)
        header_line = header_line.strip()
        headers = header_line.split(';')

        index_headers = {index:header for index, header in enumerate(headers)}

        for line in test_fh:
            line = line.strip()
            values = line.split(';')
            index_vals = {index:val for index, val in enumerate(values)}
            data[index_vals[0]] = {index_headers[key]:value for key, value in index_vals.items() if key != 0}

    with open(sys.argv[2], 'wt') as out:
        pprint.pprint(data, stream=out)

唯一不能很好處理的是嵌入式字典。 有什么想法如何清理嗎? (請參閱apt_bonus)

 'cogboost': {'action': 'quick',
              'apt_bonus': "{'COG': 5}",
              'category': 'psi-chi',
              'desc': 'The async can temporarily elevate their cognitive performance.',
              'duration': 'temp',
              'name': 'Cognitive Boost',
              'range': 'self',
              'skill': '',
              'strain_mod': '-1',
              'type': 'active'},

這是另一個版本,但沒有依賴性。

file_name = "<path>/test.txt"

data = {}
with open(file_name, 'r') as test_fh:
    header_line = next(test_fh)
    header_line = header_line.strip()
    headers = header_line.split(';')

    index_headers = {index:header for index, header in enumerate(headers)}

    for line in test_fh:
        line = line.strip()
        values = line.split(';')
        index_vals = {index:val for index, val in enumerate(values)}
        data[index_vals[0]] = {index_headers[key]:value for key, value in index_vals.items() if key != 0}

print(data)

pandas很容易做到這一點:

In [7]: import pandas as pd

In [8]: pd.read_clipboard(sep=";", index_col=0).T.to_dict()
Out[8]:
{'ambiencesense': {'action': 'automatic',
  'apt_bonus': nan,
  'category': 'psi-chi',
  'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.',
  'duration': 'constant',
  'name': 'Ambience Sense',
  'range': 'self',
  'skill': nan,
  'strain_mod': 0,
  'type': 'passive'},
 'cogboost': {'action': 'quick',
  'apt_bonus': "{'COG': 5}",
  'category': 'psi-chi',
  'desc': 'The async can temporarily elevate their cognitive performance.',
  'duration': 'temp',
  'name': 'Cognitive Boost',
  'range': 'self',
  'skill': nan,
  'strain_mod': -1,
  'type': 'active'}}

在您的情況下,您將使用pd.read_csv()而不是.read_clipboard()但它看起來大致相同。 如果您想將apt_bonus列解析為字典,則可能還需要對其進行一些調整。

不使用任何庫嘗試使用這種pythonic方式:

s = '''key;name;desc;category;type;action;range;duration;skill;strain_mod;apt_bonus
ambiencesense;Ambience Sense;This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.;psi-chi;passive;automatic;self;constant;;0;
cogboost;Cognitive Boost;The async can temporarily elevate their cognitive performance.;psi-chi;active;quick;self;temp;;-1;{'COG': 5}'''

lists = [delim.split(';') for delim in s.split('\n')]
keyIndex = lists[0].index('key')
nested = {lst[keyIndex]:{lists[0][i]:lst[i] for i in range(len(lists[0])) if i != keyIndex} for lst in lists[1:]}

結果是:

{
    'cogboost': {
        'category': 'psi-chi',
        'name': 'Cognitive Boost',
        'strain_mod': '-1',
        'duration': 'temp',
        'range': 'self',
        'apt_bonus': "{'COG': 5}",
        'action': 'quick',
        'skill': '',
        'type': 'active',
        'desc': 'The async can temporarily elevate their cognitive performance.'
    },
    'ambiencesense': {
        'category': 'psi-chi',
        'name': 'Ambience Sense',
        'strain_mod': '0',
        'duration': 'constant',
        'range': 'self',
        'apt_bonus': '',
        'action': 'automatic',
        'skill': '',
        'type': 'passive',
        'desc': 'This sleight provides the async with an instinctive sense about an area and any potential threats nearby. The async receives a +10 modifier to all Investigation, Perception, Scrounging, and Surprise Tests.'
    }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM