使用python从另一个文件中的一个文件中grep行

Question

Similar question to alternative of "grep" in python; 类似的问题替代python中的“grep”; but the complexity here is what is grepped is variable (lines) from another file. 但这里的复杂性是grepped来自另一个文件的变量（行）。 I cannot figure out how to do this using functions like re.findall() 我无法弄清楚如何使用re.findall（）等函数来做到这一点

file1: 文件1：

1  20  200
1  30  300

file2: 文件2：

1  20  200  0.1  0.5
1  20  200  0.3  0.1
1  30  300  0.2  0.6
1  40  400  0.9  0.6
2  50  300  0.5  0.7

Each line from file1 is my pattern; file1中的每一行都是我的模式; and I need to search such pattern from file2. 我需要从file2中搜索这样的模式。 Then result should be: 那么结果应该是：

    1  20  200  0.1  0.5
    1  20  200  0.3  0.1
    1  30  300  0.2  0.6

I've been trying to solve the problem using either bash or python,but cannot figure out. 我一直在尝试使用bash或python解决问题，但无法搞清楚。 thx 谢谢

Answer 1

Here's a non-regex based solution: 这是一个非正则表达式的解决方案：

with open('/tmp/file1') as f:
  lines1 = f.readlines()

with open('/tmp/file2') as f:
  for line in f:
    if any(line.startswith(x.strip()) for x in lines1):
      print line,

Answer 2

You can take advantage of the fact the the | 你可以利用|的事实 character in a regular expression means to match either the pattern on its left, or the pattern on its right: 正则表达式中的字符表示匹配左侧的模式或右侧的模式：

import re

with open('file1') as file1:
    patterns = "|".join(re.escape(line.rstrip()) for line in file1)

regexp = re.compile(patterns)
with open('file2') as file2:
    for line in file2:
        if regexp.search(line):
            print line.rstrip()

When I tried this on your sample files, it output: 当我在您的示例文件上尝试此操作时，它输出：

1   20  200 0.1 0.5
1   20  200 0.3 0.1
1   30  300 0.2 0.6

Incidentally, if you want to solve this problem in bash, the following should do it: 顺便说一下，如果你想在bash中解决这个问题，下面应该这样做：

grep -f file1 file2

Answer 3

I think you'll need your own loop 我想你需要自己的循环

file1patterns = [ re.Pattern(l) for l in f1.readlines() ]
lineToMatch = 0
matchedLines = []
for line in f2.readlines():
  if file1patterns[lineToMatch].matches(line):
    matchedLines += line
    lineToMatch += 1
  else:
    lineToMatch = 0
    matchedLines = []
  if len(matchedLines) == len(file1patterns)
    print matchedLines
    lineToMatch = 0
    matchedLines = []

(Not actual compiling Python, but hopefully enough for you to move forward) （不是实际编译Python，但希望你能继续前进）

Answer 4

Step 1: Read in all lines from file 1, split them and add them as tuples to a set. 步骤1：读入文件1中的所有行，拆分它们并将它们作为元组添加到集合中。 This will help us to do faster lookups in the next step. 这将有助于我们在下一步中更快地进行查找。

with open('file1', 'r') as f:
    file1_lines = set([tuple(line.strip().split()) for line in f])

Step 2: Filter lines from file2 that meet your criteria ie if they start with any of the lines in file1: 第2步：从file2过滤符合条件的行，即如果它们以file1中的任何行开头：

with open('file2', 'r') as f2:
    for line in itertools.ifilter(lambda x: tuple(x.split()[:3]) in file1_lines, f2):
        print line

使用python从另一个文件中的一个文件中grep行

问题描述

4 个解决方案

解决方案1
4 已采纳 2012-05-08 02:10:05

解决方案2
1 2012-05-08 01:52:42

解决方案3
0 2012-05-08 01:42:08

解决方案4
0 2012-05-08 03:05:51

使用python从另一个文件中的一个文件中grep行

问题描述

4 个解决方案

解决方案1 4 已采纳 2012-05-08 02:10:05

解决方案2 1 2012-05-08 01:52:42

解决方案3 0 2012-05-08 01:42:08

解决方案4 0 2012-05-08 03:05:51

解决方案1
4 已采纳 2012-05-08 02:10:05

解决方案2
1 2012-05-08 01:52:42

解决方案3
0 2012-05-08 01:42:08

解决方案4
0 2012-05-08 03:05:51