尝试从文本文件创建字典

Question

fieldict(filename) reads a file in DOT format and returns a dictionary with the DOT CMPLID, converted to an integer, as the key, and a tuple as the corresponding value for that key. fieldict（filename）读取DOT格式的文件，并返回带有DOT CMPLID的字典，该字典转换为整数作为键，并以元组作为该键的对应值。 The format of the tuple is: (manufacturer, date, crash, city, state) 元组的格式为：（制造商，日期，崩溃，城市，州）

fieldict("DOT500.txt")[416]
  ('DAIMLERCHRYSLER  CORPORATION', datetime.date(1995, 1, 9), False, 'ARCADIA',

so far, I have tried 到目前为止，我已经尝试过

from collections import defaultdict
import datetime

def fieldict(filename):
    with open(filename) as f:
        x=[line.split('\t')[0].strip() for line in f] #list of complaint numbers
        y= line.split('\t') #list of full complaints
        d={}
        for j in x:
            Y= True
            N= False
            d[j] = tuple(y[2],datetime.date(y[7]), y[6], y[12], y[13])   #dict with number of complaint as key and tuple with index as values
        return d

No luck... I think I am close..any help is greatly appreciated 不走运...我想我很接近..任何帮助都将不胜感激

EDIT: each complaint is formatted like this 编辑：每个投诉的格式如下

'11\t958128\tDAIMLERCHRYSLER CORPORATION\tDODGE\tSHADOW\t1990\tY\t19941117\tN\t0\t0\tENGINE AND ENGINE COOLING:ENGINE\tWILMINGTON  \tDE\t1B3XT44KXLN\t19950103\t19950103\t\t1\tENGINE MOTOR MOUNTS FAILED, RESULTING IN ENGINE NOISE. *AK\tEVOQ\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tV\t\r\n'

Entry without character marks showing : 不带字符标记的条目显示：

11  958128  DAIMLERCHRYSLER CORPORATION DODGE   SHADOW  1990    Y   19941117    N   0   0   ENGINE AND ENGINE COOLING:ENGINE    WILMINGTON      DE  1B3XT44KXLN 19950103    19950103        1   ENGINE MOTOR MOUNTS FAILED, RESULTING IN ENGINE NOISE.  *AK EVOQ

Answer 1

Note: Trimming the newline is left up to the reader. 注意：修剪换行符留给读者。

A clean way of accomplishing this is to use dict(zip(headers,data_list)) 一种干净的方法是使用dict(zip(headers,data_list))

Presuming your sample data looks like 假设您的样本数据看起来像

joe\tSan Francisco\tapple
frank\tNew York City\torange
tim\tHawaii\tpineapple

You could do something like: 您可以执行以下操作：

results = []
headers = ['person','place','fruit']

for line in open('datafile.txt').readlines():
    record = line.split('\t')
    results.append(dict(zip(headers,record)))

Which will make a dict for each line and append it to the end of 'results'. 这将为每行做出决定，并将其附加到“结果”的末尾。

Looking like: 看起来像：

[{'fruit': 'apple\n', 'person': 'joe', 'place': 'San Francisco'},
 {'fruit': 'orange\n', 'person': 'frank', 'place': 'New York City'},
 {'fruit': 'pineapple\n', 'person': 'tim', 'place': 'Hawaii'}]

Answer 2

Looks like you want to make friends with the csv module, as this looks like tab formatted csv text. 看起来您想通过csv模块结识朋友，因为这看起来像标签格式的csv文本。 The csv.reader() has a .next() method which is called when you throw it in a for loop, so you can go line by line through the file. csv.reader()有一个.next()方法，当您将其放入for循环中时会被调用，因此您可以逐行浏览文件。

As a general tip, read PEP8, and use understandable variable names. 作为一般提示，请阅读PEP8，并使用易于理解的变量名。 With python, if it starts to feel hard that's a good sign that there usually is a better way. 使用python时，如果开始感到难受，则表明通常有更好的方法。

import csv
import datetime

def _build_datetime(line)
    year_idx = x
    month_idx = y
    day_idx = z
    indexes = (year_idx, month_idx, day_idx)

    result_datetime = None
    if all(line[idx] for idx in indexes): # check that expected values are populated
        int_values = [int(line[idx]) for idx in indexes]
        result_datetime = datetime.date(*int_values)
    return result_datetime

def format2dict(filename):
    complaints = {}
    with open(filename, "rb") as in_f:
        reader = csv.reader(in_f, delimiter='\t')
        complaint_id_idx = 0
        manufacturer_idx = 2
        crash_idx = x
        city_idx = 12
        state_idx = 13

        for line in reader:
            complaint_id = int(line[complaint_id_idx])
            data= (
                         line[manufacturer_idx], 
                         _build_datetime(line),
                         line[crash_idx],
                         line[city_idx],
                         line[state_idx],
                        )

            complaints[complaint_id] = data
    return complaints


if __name__ == "__main__":
    formatted_data = format2dict("DOT500.txt")

Answer 3

You're on the right track with line.split('\\t') to break up text into pieces. 您使用line.split（'\\ t'）处于正确的轨道，可以将文本分解成碎片。 Try something like this to build up the tuple from the split pieces. 尝试类似这样的操作以从拆分的片段中构建元组。

import datetime

a = '11\t958128\tDAIMLERCHRYSLER CORPORATION\tDODGE\tSHADOW\t1990\tY\t19941117\tN\t0\t0\tENGINE AND ENGINE COOLING:ENGINE\tWILMINGTON  \tDE\t1B3XT44KXLN\t19950103\t19950103\t\t1\tENGINE MOTOR MOUNTS FAILED, RESULTING IN ENGINE NOISE. *AK\tEVOQ\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tV\t'

fields = a.split('\t')
recordNum = fields[0]
mfr = fields[2]
recDate = datetime.date(int(fields[5]),1,2)
make = fields[4]
DOTrecord = recordNum,mfr, recDate,make
print DOTrecord

尝试从文本文件创建字典

问题描述

3 个解决方案

解决方案1
2 2012-10-31 05:14:32

解决方案2
2 已采纳 2012-10-31 08:56:38

解决方案3
1 2012-10-31 06:37:10

尝试从文本文件创建字典

问题描述

3 个解决方案

解决方案1 2 2012-10-31 05:14:32

解决方案2 2 已采纳 2012-10-31 08:56:38

解决方案3 1 2012-10-31 06:37:10

解决方案1
2 2012-10-31 05:14:32

解决方案2
2 已采纳 2012-10-31 08:56:38

解决方案3
1 2012-10-31 06:37:10