简体   繁体   English

python-在匹配的字符串后提取几行

[英]python - extract several lines following a matched string

I have two data files containing sets of 4 lines. 我有两个包含4行的数据文件。 I need to extract the sets of 4 lines contained in the second file if part of the 1st line of every set matches. 如果每个集合的第一行的一部分匹配,我需要提取第二个文件中包含的4行的集合。

Here is an example of input data: 这是输入数据的示例:

input1.txt
@abcde:134/1
JDOIJDEJAKJ
content1
content2

input2.txt
@abcde:134/2
JKDJFLJSIEF
content3
content4
@abcde:135/2
KFJKDJFKLDJ
content5
content6

Here is what the output should look like: 输出如下所示:

output.txt
@abcde:134/2
JKDJFLJSIEF
content3
content4

Here is my attempt at writing code... 这是我尝试编写代码的尝试...

import sys

filename1 = sys.argv[1] #input1.txt
filename2 = sys.argv[2] #input2.txt

F = open(filename1, 'r')
R = open(filename2, 'r')

def output(input1, input2):
    for line in input1:
        if "@" in line:
            for line2 in input2:
                if line[:-1] in line2:
                    for i in range(4):
                        print next(input2)

output = output(F, R)
write(output)

I get invalid syntax for next() which I can't figure out, and I would be happy if someone could correct my code or give me tips on how to make this work. 我得到的next()语法无效,我无法弄清楚,如果有人可以更正我的代码或给我有关如何进行此工作的提示,我将很高兴。

===EDIT=== OK, I think I have managed to implement the solutions proposed in the comments below (thank you). ===编辑===好,我想我已经实现了以下注释中提出的解决方案(谢谢)。 I am now running the code on a Terminal session connected by ssh to a remote Ubuntu server. 我现在在通过ssh连接到远程Ubuntu服务器的终端会话上运行代码。 Here is what the code looks like now. 这是现在的代码。 (This time I am running python2.7) (这一次我正在运行python2.7)

filename1 = sys.argv[1] #input file 1
filename2 = sys.argv[2] #input file 2 (some lines of which will be in the output)

F = open(filename1, 'r')
R = open(filename2, 'r')

def output(input1, input2):
    for line in input1:
        input2.seek(0)
        if "@" in line:
            for line2 in input2:
                if line[:-2] in line2:
                    for i in range(4):
                        out = next(input2)
                        print out
                        return

output (F, R)

Then I run this command: 然后运行以下命令:

python fetch_reverse.py test1.fq test.fq > test2.fq

I don't get any warnings, but the output file is empty. 我没有收到任何警告,但是输出文件为空。 What am I doing wrong? 我究竟做错了什么?

Split out the reading of the first file from reading the second file; 从读取第二个文件中分离出读取第一个文件; gather all lines you want to match (unless you are reading hundreds of thousands of lines to match). 收集所有要匹配的行(除非您正在阅读成千上万的要匹配的行)。 Store all lines you want to match, minus the digit at the end, in a set for fast access. 将要匹配的所有行(末尾减去数字)存储在一组中以便快速访问。

Then scan the other file for matching lines: 然后扫描其他文件以查找匹配的行:

def output(input1, input2):
    with input1:  # automatically close when done
        # set comprehension of all lines starting with @, minus last character
        to_match = {line.strip()[:-1] for line in input1 if line[0] == '@'}

    with input2:
        for line in input2:
            if line[0] == '@' and line.strip()[:-1] in to_match:
                print line.strip()
                for i in range(3):
                    print next(input2, '').strip()

You need to print the matched line too, then read the next three lines (line number 1 was already read). 您需要打印匹配的太行,然后阅读接下来的线(1号线已经被读取)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM