解析文本文件不同行的有效方法

Question

我有一个包含如下数据的文本文件：

1 --- 1 --- 100

2 --- 1 --- 200

3 --- 1 --- 100

1 --- 2 --- 300

2 --- 2 --- 100

3 --- 2 --- 400

我想提取对应于第二列不同值的第三列数据，例如在第三列中添加与第二列中的数字1对应的三个数字，依此类推。 我可以逐行循环遍历文本，并在每行中找到第三列并添加它们。 但这不是我想要的。 我应该如何在 Python 中有效地做到这一点？

Answer 1

使用itertools.groupby() 。

例如，我正在使用您的确切“数据结构” （stackoverflow 问题中的一堆文本）：

import itertools

data_structure = '''
1 --- 1 --- 100

2 --- 1 --- 200

3 --- 1 --- 100

1 --- 2 --- 300

2 --- 2 --- 100

3 --- 2 --- 400
'''.splitlines()

# create a key function able to extract the data you want to group:
def _key(line):
    return line.strip().split(' --- ')[1] # the 1 here means second column

#cleanup data:
clean_data = (line.strip() for line in data_structure if line.strip())

# then pass it to itertools.groupby:
for key, lines in itertools.groupby(clean_data, key=_key):
    print("Lines that contain number", key, 'in second column:')
    print(', '.join(lines))

结果：

Lines that contain number 1 in second column:
1 --- 1 --- 100, 2 --- 1 --- 200, 3 --- 1 --- 100
Lines that contain number 2 in second column:
1 --- 2 --- 300, 2 --- 2 --- 100, 3 --- 2 --- 400

编辑：既然你编辑了问题，并说你有一个文本文件，那么你可以用它代替data_structure ，它会起作用：

data_structure = open('myfile.txt')

其余代码保持不变

解析文本文件不同行的有效方法

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-09-26 18:05:10

解析文本文件不同行的有效方法

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-09-26 18:05:10

解决方案1
0 已采纳 2018-09-26 18:05:10