简体   繁体   English

将文本文件读入Python的词典列表中

[英]Read text file into a list of dictionaries in Python

I have searched for a while but haven't seen a simple enough answer to this. 我已经搜索了一段时间,但还没有找到足够简单的答案。

I have an already very structured txt file with a lot of elements like this: 我已经有一个非常结构化的txt文件,其中包含很多这样的元素:

product/productId: B000GKXY4S
review/userId: A1QA985ULVCQOB
review/profileName: Carleen M. Amadio "Lady Dragonfly"
review/helpfulness: 2/2
review/score: 5.0
review/time: 1314057600
review/summary: Fun for adults too!
review/text: I really enjoy these scissors for my inspiration books that I am making (like collage, but in books) and using these different textures these give is just wonderful, makes a great statement with the pictures and sayings. Want more, perfect for any need you have even for gifts as well. Pretty cool!

product/productId: B000GKXY4S
review/userId: ALCX2ELNHLQA7
review/profileName: Barbara
review/helpfulness: 0/0
review/score: 5.0
review/time: 1328659200
review/summary: Making the cut!
review/text: Looked all over in art supply and other stores for "crazy cutting" scissors for my 4-year old grandson.  These are exactly what I was looking for - fun, very well made, metal rather than plastic blades (so they actually do a good job of cutting paper), safe ("blunt") ends, etc.  (These really are for age 4 and up, not younger.)  Very high quality.  Very pleased with the product.

product/productId: B000140KIW
review/userId: A2M2M4R1KG5WOL
review/profileName: L. Heminway
review/helpfulness: 1/1
review/score: 5.0
review/time: 1156636800
review/summary: Fiskars Softouch Multi-Purpose Scissors, 10"
review/text: These are the BEST scissors I have ever owned.  I am left-handed and take note that either a left or right-handed person can use these equally well.               If you have arthritis, as I do, these scissors are amazing as well.  Well worth the price.  I now own three pairs of these and have convinced many other people in my quilting group that they NEED a pair as well!             They cut through muli layers and difficult to cut items really well.            Do buy them, you won't regret it!

This would be one dictionary and I want a list of dictionaries like this. 这将是一本词典,我想要一个这样的词典列表。 What is the simplest way to do it? 最简单的方法是什么? I tried csv but it seems not correct: 我尝试了csv但似乎不正确:

field = ("product/productId", "review/userId", "review/profileName", "review/helpfulness",
              "review/score","review/time", "review/summary", "review/text")

reader = csv.DictReader(open('../Arts.txt'), fieldnames=field)

Could someone help me on this novice problem? 有人可以帮助我解决这个新手问题吗? Thanks! 谢谢!

In this case you just want to read each line, split on : to get the key and value, and then add that pair to the current dictionary. 在这种情况下,您只想读取每一行,分割为:以获取键和值,然后将该对添加到当前字典中。 Since your file is well-structured you can just detect when a new block starts by the field name: 由于文件结构良好,因此您只需按字段名称检测何时有新块开始:

data = []
current = {}
with open('../Arts.txt') as f:
    for line in f:
        pair = line.split(': ', 1)
        if len(pair) == 2:
            if pair[0] == 'product/productId' and current:
                # start of a new block
                data.append(current)
                current = {}
            current[pair[0]] = pair[1]
    if current:
        data.append(current)

You would use csv if you had a file with multiple columns, for example a csv file with your same data might look something like the following: 如果您的文件具有多列,则可以使用csv,例如,具有相同数据的csv文件可能类似于以下内容:

product/productId,review/userId,review/profileName,...
B000GKXY4S,A1QA985ULVCQOB,Carleen M. Amadio "Lady Dragonfly",...
B000GKXY4S,ALCX2ELNHLQA7,Barbara,...

I'm surprised that csv reader did not work, perhaps you did something unexpected by the reader. 令我感到惊讶的是,csv阅读器无法正常工作,也许您做了一些阅读器意外的操作。

Saving tons of dictionaries is not a good usage. 节省大量词典不是一个好习惯。 Instead, there is built-in "immutable dict" called namedtuple in collections, which is much cheaper and easy to use. 取而代之的是,在集合中有一个内置的名为“ namedtuple”的“不可变字典”,它更便宜且易于使用。

This can actually be solved by simply reading a constant chunk of lines at a time(in this case, 8 lines + 1 empty line): 实际上,这可以通过一次读取一个恒定的行块(在这种情况下为8行+ 1空行)来解决:

from collections import namedtuple
data_point = namedtuple('data_point', field)

data_lst = list()
with open('some_path/somefile.txt') as f_in:
    while True:
        data = [f_in.readline().strip().split(':')[1] for range(8)]
        if sum([len(ele) for ele in data]) == 0:
            break
        data_lst.append(data_point(data))
        f_in.readline()

People are so used to for loop in python, that they forget the existence of while loop. 人们习惯于在Python中使用for循环,以至于他们忘记了while循环的存在。

The number 8 may vary a bit if what you showed in the question does not hold throughout the whole file. 如果您在问题中显示的内容在整个文件中都没有,数字8可能会有所不同。 In that case, you should expend the for loop that read lines and check the conditions. 在这种情况下,您应该扩展用于读取行并检查条件的for循环。 Here I'm taking advantage of a clean dataset. 在这里,我利用了干净的数据集。

Also, change your fields so that it does not contain "/" or other special char. 另外,更改字段,使其不包含“ /”或其他特殊字符。 The names of the field does not matter much as long as the order of them are preserved. 只要保留字段的顺序,字段的名称就无关紧要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM