简体   繁体   中英

Python equivalent to perl -pe?

I need to pick some numbers out of some text files. I can pick out the lines I need with grep, but didn't know how to extract the numbers from the lines. A colleague showed me how to do this from bash with perl:

cat results.txt | perl -pe 's/.+(\d\.\d+)\.\n/\1 /'

However, I usually code in Python, not Perl. So my question is, could I have used Python in the same way? Ie, could I have piped something from bash to Python and then gotten the result straight to stdout? ... if that makes sense. Or is Perl just more convenient in this case?

Yes, you can use Python from the command line. python -c <stuff> will run <stuff> as Python code. Example:

python -c "import sys; print sys.path"

There isn't a direct equivalent to the -p option for Perl (the automatic input/output line-by-line processing), but that's mostly because Python doesn't use the same concept of $_ and whatnot that Perl does - in Python, all input and output is done manually (via raw_input() / input() , and print / print() ).


For your particular example:

cat results.txt | python -c "import re, sys; print ''.join(re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line) for line in sys.stdin)"

(Obviously somewhat more unwieldy. It's probably better to just write the script to do it in actual Python.)

您可以使用:

$ python -c '<your code here>'

You can in theory, but Python doesn't have anywhere near as much regex magic that Perl does, so the resulting command will be much more unwieldy, especially as you can't use regular expressions without importing re (and you'll probably need sys for sys.stdin too).

The Python equivalent of your colleague's Perl one-liner is approximately:

import sys, re
for line in sys.stdin:
    print re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line)

You have a problem which can be solved several ways.

I think you should consider using regular expression (what perl is doing in your example) directly from Python. Regular expressions are in the re module. An example would be:

import re
filecontent = open('somefile.txt').read()
print re.findall('.+(\d\.\d+)\.$', filecontent)

(I would prefer using $ instead of '\\n' for line endings, because line endings are different between operational systems and file encodings)

If you want to call bash commands from inside Python, you could use:

import os
os.system(mycommand)

Where command is the bash command. I use it all the time, because some operations are better to perform in bash than in Python.

Finally, if you want to extract the numbers with grep, use the -o option, which prints only the matched part.

Quoting from https://stackoverflow.com/a/12259852/411282 :

for ln in __import__("fileinput").input(): print ln.rstrip()

See the explanation linked above, but this does much more of what perl -p does, including support for multiple file names and stdin when no filename is given.

https://docs.python.org/3/library/fileinput.html#fileinput.input

您可以使用 python 直接从 bash 命令行执行代码,通过使用python -c ,或者您可以使用sys.stdin处理通过管道传输到 stdin 的输入,请参见此处

Perl (or sed) is more convenient. However it is possible, if ugly:

python -c 'import sys, re; print "\n".join(re.sub(".+(\d\.\d+)\.\n","\1 ", l) for l in sys.stdin)'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM