Python-将多行读入列表

Question

好家伙/女孩再次陷入简单的事情
我有一个文本文件，每个条目有多行，数据采用以下格式

第一词词词词
wordx单词单词word 有趣的1个单词单词单词单词单词
罗y的词
wordz单词单词word 有趣的2单词单词单词lastword

此序列重复了一百次左右，除有趣的1和有趣的2外，所有其他单词都相同，没有空行。 有趣的2与有趣的1有关，但与其他任何东西都不相关，我想将两个有趣的项目链接在一起，而丢弃诸如

有趣的1 =有趣的2
有趣的1 =有趣的2
有趣的1 =有趣的2
等等，每个序列1公升

每行以不同的词开头
我的尝试是读取文件并执行“ if wordx in line”语句以标识第一个有趣的行，将值切出，找到第二行，（“ if wordz in line）切出值并将第二个连接起来首先。
不过这很笨拙，我必须使用全局变量，临时变量等，而且我敢肯定，必须有一种方法来识别第一个单词和最后一个单词之间的范围，并将其放入单个列表中，然后将两个值切在一起。

任何建议表示感谢，感谢您的宝贵时间

Answer 1

from itertools import izip, tee, islice

i1, i2 = tee(open("foo.txt"))

for line2, line4 in izip(islice(i1,1, None, 4), islice(i2, 3, None, 4)) :
    print line2.split(" ")[4], "=", line4.split(" ")[4]

Answer 2

在这种情况下，请创建一个与重复文本匹配的正则表达式，并为感兴趣的位提供分组。 然后，您应该能够使用findall查找有趣的1和有趣的2的所有情况。

像这样：import re

text = open("foo.txt").read()
RE = re.compile('firstword.*?wordx word word word (.*?) word.*?wordz word word word (.*?) word', re.DOTALL)
print RE.findall(text)

尽管如评论中所述，islice绝对是更整洁的解决方案。

Answer 3

我抛出了一堆断言来检查数据布局的规律性。

C:\SO>type words.py

# sample pseudo-file contents
guff = """\
firstword word word word
wordx word word word interesting1-1 word word word word
wordy word word word
wordz word word word interesting2-1 word word word lastword

miscellaneous rubbish

firstword word word word
wordx word word word interesting1-2 word word word word
wordy word word word
wordz word word word interesting2-2 word word word lastword
firstword word word word
wordx word word word interesting1-3 word word word word
wordy word word word
wordz word word word interesting2-3 word word word lastword

"""

# change the RHS of each of these to reflect reality
FIRSTWORD = 'firstword'
WORDX = 'wordx'
WORDY = 'wordy'
WORDZ = 'wordz'
LASTWORD = 'lastword'

from StringIO import StringIO
f = StringIO(guff)

while True:
    a = f.readline()
    if not a: break # end of file
    a = a.split()
    if not a: continue # empty line
    if a[0] != FIRSTWORD: continue # skip extraneous matter
    assert len(a) == 4
    b = f.readline().split(); assert len(b) == 9
    c = f.readline().split(); assert len(c) == 4
    d = f.readline().split(); assert len(d) == 9
    assert a[0] == FIRSTWORD
    assert b[0] == WORDX
    assert c[0] == WORDY
    assert d[0] == WORDZ
    assert d[-1] == LASTWORD
    print b[4], d[4]

C:\SO>\python26\python words.py
interesting1-1 interesting2-1
interesting1-2 interesting2-2
interesting1-3 interesting2-3

C:\SO>

Python-将多行读入列表

问题描述

3 个解决方案

解决方案1
6 2009-06-30 07:46:01

解决方案2
0 2009-06-30 07:20:54

解决方案3
0 2009-06-30 08:08:57

Python-将多行读入列表

问题描述

3 个解决方案

解决方案1 6 2009-06-30 07:46:01

解决方案2 0 2009-06-30 07:20:54

解决方案3 0 2009-06-30 08:08:57

解决方案1
6 2009-06-30 07:46:01

解决方案2
0 2009-06-30 07:20:54

解决方案3
0 2009-06-30 08:08:57