简体   繁体   中英

Read text file into a list of dictionaries in Python

I have searched for a while but haven't seen a simple enough answer to this.

I have an already very structured txt file with a lot of elements like this:

product/productId: B000GKXY4S
review/userId: A1QA985ULVCQOB
review/profileName: Carleen M. Amadio "Lady Dragonfly"
review/helpfulness: 2/2
review/score: 5.0
review/time: 1314057600
review/summary: Fun for adults too!
review/text: I really enjoy these scissors for my inspiration books that I am making (like collage, but in books) and using these different textures these give is just wonderful, makes a great statement with the pictures and sayings. Want more, perfect for any need you have even for gifts as well. Pretty cool!

product/productId: B000GKXY4S
review/userId: ALCX2ELNHLQA7
review/profileName: Barbara
review/helpfulness: 0/0
review/score: 5.0
review/time: 1328659200
review/summary: Making the cut!
review/text: Looked all over in art supply and other stores for "crazy cutting" scissors for my 4-year old grandson.  These are exactly what I was looking for - fun, very well made, metal rather than plastic blades (so they actually do a good job of cutting paper), safe ("blunt") ends, etc.  (These really are for age 4 and up, not younger.)  Very high quality.  Very pleased with the product.

product/productId: B000140KIW
review/userId: A2M2M4R1KG5WOL
review/profileName: L. Heminway
review/helpfulness: 1/1
review/score: 5.0
review/time: 1156636800
review/summary: Fiskars Softouch Multi-Purpose Scissors, 10"
review/text: These are the BEST scissors I have ever owned.  I am left-handed and take note that either a left or right-handed person can use these equally well.               If you have arthritis, as I do, these scissors are amazing as well.  Well worth the price.  I now own three pairs of these and have convinced many other people in my quilting group that they NEED a pair as well!             They cut through muli layers and difficult to cut items really well.            Do buy them, you won't regret it!

This would be one dictionary and I want a list of dictionaries like this. What is the simplest way to do it? I tried csv but it seems not correct:

field = ("product/productId", "review/userId", "review/profileName", "review/helpfulness",
              "review/score","review/time", "review/summary", "review/text")

reader = csv.DictReader(open('../Arts.txt'), fieldnames=field)

Could someone help me on this novice problem? Thanks!

In this case you just want to read each line, split on : to get the key and value, and then add that pair to the current dictionary. Since your file is well-structured you can just detect when a new block starts by the field name:

data = []
current = {}
with open('../Arts.txt') as f:
    for line in f:
        pair = line.split(': ', 1)
        if len(pair) == 2:
            if pair[0] == 'product/productId' and current:
                # start of a new block
                data.append(current)
                current = {}
            current[pair[0]] = pair[1]
    if current:
        data.append(current)

You would use csv if you had a file with multiple columns, for example a csv file with your same data might look something like the following:

product/productId,review/userId,review/profileName,...
B000GKXY4S,A1QA985ULVCQOB,Carleen M. Amadio "Lady Dragonfly",...
B000GKXY4S,ALCX2ELNHLQA7,Barbara,...

I'm surprised that csv reader did not work, perhaps you did something unexpected by the reader.

Saving tons of dictionaries is not a good usage. Instead, there is built-in "immutable dict" called namedtuple in collections, which is much cheaper and easy to use.

This can actually be solved by simply reading a constant chunk of lines at a time(in this case, 8 lines + 1 empty line):

from collections import namedtuple
data_point = namedtuple('data_point', field)

data_lst = list()
with open('some_path/somefile.txt') as f_in:
    while True:
        data = [f_in.readline().strip().split(':')[1] for range(8)]
        if sum([len(ele) for ele in data]) == 0:
            break
        data_lst.append(data_point(data))
        f_in.readline()

People are so used to for loop in python, that they forget the existence of while loop.

The number 8 may vary a bit if what you showed in the question does not hold throughout the whole file. In that case, you should expend the for loop that read lines and check the conditions. Here I'm taking advantage of a clean dataset.

Also, change your fields so that it does not contain "/" or other special char. The names of the field does not matter much as long as the order of them are preserved.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM