简体   繁体   中英

Efficient way to parse different lines of a text file

I have a text file containing the data like this:

1 --- 1 --- 100

2 --- 1 --- 200

3 --- 1 --- 100

1 --- 2 --- 300

2 --- 2 --- 100

3 --- 2 --- 400

I want to extract the data of third column corresponding to different values of second column, for example add three numbers in third column corresponding to number 1 in second column and so on. I can loop though the text line by line, and find the third column in each line and add them. But that is not what I want. How should I do it efficiently in Python?

Use itertools.groupby() .

As an example, I'm using your exact "data structure" (a bunch of text in a stackoverflow question):

import itertools

data_structure = '''
1 --- 1 --- 100

2 --- 1 --- 200

3 --- 1 --- 100

1 --- 2 --- 300

2 --- 2 --- 100

3 --- 2 --- 400
'''.splitlines()

# create a key function able to extract the data you want to group:
def _key(line):
    return line.strip().split(' --- ')[1] # the 1 here means second column

#cleanup data:
clean_data = (line.strip() for line in data_structure if line.strip())

# then pass it to itertools.groupby:
for key, lines in itertools.groupby(clean_data, key=_key):
    print("Lines that contain number", key, 'in second column:')
    print(', '.join(lines))

The results:

Lines that contain number 1 in second column:
1 --- 1 --- 100, 2 --- 1 --- 200, 3 --- 1 --- 100
Lines that contain number 2 in second column:
1 --- 2 --- 300, 2 --- 2 --- 100, 3 --- 2 --- 400

EDIT: Now that you edited the question, and said you have a text file, then you can just use it in place of data_structure and it will work:

data_structure = open('myfile.txt')

The rest of the code remains the same

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM