[英]Efficient way to parse different lines of a text file
I have a text file containing the data like this:我有一个包含如下数据的文本文件:
1 --- 1 --- 100 1 --- 1 --- 100
2 --- 1 --- 200 2 --- 1 --- 200
3 --- 1 --- 100 3 --- 1 --- 100
1 --- 2 --- 300 1 --- 2 --- 300
2 --- 2 --- 100 2 --- 2 --- 100
3 --- 2 --- 400 3 --- 2 --- 400
I want to extract the data of third column corresponding to different values of second column, for example add three numbers in third column corresponding to number 1 in second column and so on.我想提取对应于第二列不同值的第三列数据,例如在第三列中添加与第二列中的数字1对应的三个数字,依此类推。 I can loop though the text line by line, and find the third column in each line and add them.
我可以逐行循环遍历文本,并在每行中找到第三列并添加它们。 But that is not what I want.
但这不是我想要的。 How should I do it efficiently in Python?
我应该如何在 Python 中有效地做到这一点?
Use itertools.groupby()
.使用
itertools.groupby()
。
As an example, I'm using your exact "data structure" (a bunch of text in a stackoverflow question):例如,我正在使用您的确切“数据结构” (stackoverflow 问题中的一堆文本):
import itertools
data_structure = '''
1 --- 1 --- 100
2 --- 1 --- 200
3 --- 1 --- 100
1 --- 2 --- 300
2 --- 2 --- 100
3 --- 2 --- 400
'''.splitlines()
# create a key function able to extract the data you want to group:
def _key(line):
return line.strip().split(' --- ')[1] # the 1 here means second column
#cleanup data:
clean_data = (line.strip() for line in data_structure if line.strip())
# then pass it to itertools.groupby:
for key, lines in itertools.groupby(clean_data, key=_key):
print("Lines that contain number", key, 'in second column:')
print(', '.join(lines))
The results:结果:
Lines that contain number 1 in second column:
1 --- 1 --- 100, 2 --- 1 --- 200, 3 --- 1 --- 100
Lines that contain number 2 in second column:
1 --- 2 --- 300, 2 --- 2 --- 100, 3 --- 2 --- 400
EDIT: Now that you edited the question, and said you have a text file, then you can just use it in place of data_structure
and it will work:编辑:既然你编辑了问题,并说你有一个文本文件,那么你可以用它代替
data_structure
,它会起作用:
data_structure = open('myfile.txt')
The rest of the code remains the same其余代码保持不变
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.