简体   繁体   English

在 Python3 中不使用 FOR 的模式匹配后返回一个巨大文件的确切行

[英]Return the exact lines of a Huge file after pattern matching without using FOR in Python3

I am new to Python.我是 Python 的新手。 My problem here is that: I want to match a pattern against a large file and return matching lines(not just the matched string) from it.我的问题是:我想将一个模式与一个大文件进行匹配,并从中返回匹配的行(不仅仅是匹配的字符串)。 I DO NOT want a FOR loop for this as my file is huge.我不想为此使用 FOR 循环,因为我的文件很大。 I am using mmap for reading the file.我正在使用mmap来读取文件。

示例文件

in the above file, if I search for bhuvi , I should get 2 rows, bhuvi and bhuvi Kumar在上面的文件中,如果我搜索bhuvi ,我应该得到 2 行, bhuvibhuvi Kumar

I used re.findall() for this, but it just returns the substrings, not the whole lines.我为此使用了 re.findall(),但它只返回子字符串,而不是整行。

Can someone please suggest what I can do here?有人可以建议我在这里做什么吗?

If your input file is huge, you cannot use readlines , but nothing prevents you from reading one line in a loop.如果输入文件是巨大的,你不能使用readlines ,但没有阻止你读一行在一个循环。

As the file object is iterable, you can write the loop as:由于文件对象是可迭代的,您可以将循环编写为:

for line in fh:

and process the content of the input line inside the loop.并在循环内处理输入行的内容。

The file size is not important, as you do not attempt to read all lines at once.文件大小并不重要,因为您不会尝试一次读取所有行。

To check for presence of your string ( bhuvi ) in the line use re.search , not re.findall .要检查该行中是否存在您的字符串 ( bhuvi ),请使用re.search ,而不是re.findall Actually you don't need any list of matches, it is enough to find a single match (it works quicker).其实你不需要匹配任何名单,就足以找到一个匹配(它的工作原理更快)。

Below you have an example program ( Python 3.7 ), writing the lines contaning your string, along with the line number:下面是一个示例程序( Python 3.7 ),编写包含字符串的行以及行号:

import re

cnt = 0
with open('input.txt') as fh:
    for line in fh:
        line = line.rstrip()
        cnt += 1
        if re.search('bhuvi', line):
            print(f'{cnt}: {line}')

Note that I used rstrip() to remove the trailing newline, if any.请注意,我使用rstrip()删除了尾随的换行符(如果有)。

Edit after your comment:在您的评论后编辑:

You wrote that the file to check is huge .您写道要检查的文件很大 So there is a risk that if you try to read it whole into the computer memory, the program runs out of memory.因此,如果您尝试将其全部读入计算机内存,则存在程序内存不足的风险。

In such a case you would have to read the file chunk by chunk and perform search in each chunk separately.在这种情况下,您必须逐块读取文件并分别在每个块中执行搜索。

There is also a risk that a row with the text you are looking for will be partially read in one chunk and the rest in the next, so you have to take some measure to avoid this in your program.还有一种风险是,包含​​您正在查找的文本的行将在一个块中部分读取,其余部分在下一个块中读取,因此您必须采取一些措施在程序中避免这种情况。

On the other hand, if there is no other way but using mmap , try something like re.finditer(r'[^\\n]*bhuvi[^\\n]*', map) , ie create an iterator looking for:另一方面,如果除了使用mmap没有其他方法,请尝试类似re.finditer(r'[^\\n]*bhuvi[^\\n]*', map) ,即创建一个迭代器寻找:

  1. A sequence of chars other than \\n .\\n之外的字符序列。
  2. Your string.你的字符串。
  3. Another sequence of chars other than \\n .\\n之外的另一个字符序列。

This way the match object returned by the iterator will match the whole line , not your string alone.这样迭代器返回的匹配对象将匹配整行,而不是单独的字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Python 中使用正则表达式在匹配模式后打印三行 - print three lines after the matching pattern using regular expression in Python 在 Python3 中不使用 writelines() 将多行写入文件 - Write multiple lines to file without using writelines() in Python3 在python中具有匹配关键字的文件中搜索和返回行 - Search and return lines in a file with matching keyword in python 使用 python 如何使用模式匹配分隔文本行并将它们存储到不同的文本文件中 - Using python how can I separate lines of text using pattern matching and store them into different text file Python:模式匹配后如何打印连续行 - Python: how to print consecutive lines after pattern matching 使用python在匹配模式的多行之间提取字符串 - Extracting strings between mutiple lines matching a pattern using python 在使用python匹配确切的字符串模式后如何打印文件的行? - How to print line of a file after match an exact string pattern with python? 在 python 中使用模式匹配获取文件扩展名 - Getting file extension using pattern matching in python 与使用python的大文件B相比,从大文件A中查找唯一行的最快方法是什么? - What's the fastest way to find unique lines from huge file A as compared to huge file B using python? 使用python进行模式匹配 - Pattern matching using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM