简体   繁体   English

从Perl到Python的正则表达式转换

[英]Regex translation from Perl to Python

I would like to rewrite a small Perl programm to Python. 我想将一个小的Perl程序重写为Python。 I am processing text files with it as follows: 我用它处理文本文件如下:

Input: 输入:

00000001;Root;;
00000002;  Documents;;
00000003;    oracle-advanced_plsql.zip;file;
00000004;  Public;;
00000005;  backup;;
00000006;    20110323-JM-F.7z.001;file;
00000007;    20110426-JM-F.7z.001;file;
00000008;    20110603-JM-F.7z.001;file;
00000009;    20110701-JM-F-via-summer_school;;
00000010;      20110701-JM-F-yyy.7z.001;file;

Desired output: 期望的输出:

00000001;;Root;;
00000002;  ;Documents;;
00000003;    ;oracle-advanced_plsql.zip;file;
00000004;  ;Public;;
00000005;  ;backup;;
00000006;    ;20110323-JM-F.7z.001;file;
00000007;    ;20110426-JM-F.7z.001;file;
00000008;    ;20110603-JM-F.7z.001;file;
00000009;    ;20110701-JM-F-via-summer_school;;
00000010;      ;20110701-JM-F-yyy.7z.001;file;

Here is the working Perl code: 这是工作的Perl代码:

#filename: perl_regex.pl
#/usr/bin/perl -w
while(<>) {                                                           
  s/^(.*?;.*?)(\w)/$1;$2/;                                            
  print $_;                                                           
}      

It call it from the command line: perl_regex.pl input.txt 它从命令行调用它: perl_regex.pl input.txt

Explanation of the Perl-style regex: Perl风格的正则表达式的解释:

s/        # start search-and-replace regexp
  ^       # start at the beginning of this line
  (       # save the matched characters until ')' in $1
    .*?;  # go forward until finding the first semicolon
    .*?   # go forward until finding... (to be continued below)
  )
  (       # save the matched characters until ')' in $2
    \w    # ... the next alphanumeric character.
  )
/         # continue with the replace part
  $1;$2   # write all characters found above, but insert a ; before $2
/         # finish the search-and-replace regexp.

Could anyone tell me, how to get the same result in Python? 谁能告诉我,如何在Python中获得相同的结果? Especially for the $1 and $2 variables I couldn't find something alike. 特别是对于1美元和2美元的变量,我找不到类似的东西。

The replace instruction for s/pattern/replace/ in python regexes is the re.sub(pattern, replace, string) function, or re.compile(pattern).sub(replace, string). s / pattern / replace /在python正则表达式中的替换指令是re.sub(pattern,replace,string)函数,或re.compile(pattern).sub(replace,string)。 In your case, you will do it so: 在您的情况下,您将这样做:

_re_pattern = re.compile(r"^(.*?;.*?)(\w)")
result = _re_pattern.sub(r"\1;\2", line)

Note that $1 becomes \\1 . 请注意, $1变为\\1 As for perl, you need to iterate over your lines the way you want to do it (open, inputfile, splitlines, ...). 至于perl,你需要以你想要的方式迭代你的行(open,inputfile,splitlines,...)。

Python regular expression is very similar to Perl's, except: Python正则表达式与Perl非常相似,除了:

  • In Python there's no regular expression literal. 在Python中,没有正则表达式文字。 It should be expressed using string. 它应该用字符串表示。 I used r'raw string literal' in the following code. 我在下面的代码中使用了r'raw string literal'
  • Backreferences are expressed as \\1 , \\2 , .. or \\g<1> , \\g<2> , .. 反向引用表示为\\1\\2 ,..或\\g<1>\\g<2> ,..
  • ... ...

Use re.sub to replace. 使用re.sub替换。

import re
import sys

for line in sys.stdin: # Explicitly iterate standard input line by line
    # `line` contains trailing newline!
    line = re.sub(r'^(.*?;.*?)(\w)', r'\1;\2', line)
    #print(line) # This print trailing newline
    sys.stdout.write(line) # Print the replaced string back.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM