简体   繁体   中英

Faster than Grep on Python?

So, the bash command that I'd be normally using in a Bash Script would be something like:

$ cat huge2GBfile.txt | grep -w "pattern1/|pattern2/|pattern3" > out.txt

It will output the lines in huge2GBfile where it has found pattern1,2,3. I was wondering if this is achievable through python. I know that I can use

os.system(cmd) 

But I'd like to know if there is something similar in Python (I am a complete noob) and if it is faster than using cat+grep. Thanks!

Initial thoughts, would something like

for line in f:
     if pattern in line:
          out.write(line)

be faster?

Even with an algorithm that is better than the logic grep uses (as someone already commented they are highly optimised, grep is 30 years old!), there is still the fact that they are utilities written in C, and compiled natively for the system.

Python is an interpreted language, and can be a couple of orders of magnitude slower than native C, so I would argue that the answer is no, there is nothing in python that could be faster.

If you want to process the output of a grep command line by line, an option would be to build your python script similar to a unix command line tool, so that it can read from stdin and write to stdout, so you could use something like :

grep pattern file | python myscript.py

How do you read from stdin in Python?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM