从Perl到Python的正则表达式转换

Question

I would like to rewrite a small Perl programm to Python. 我想将一个小的Perl程序重写为Python。 I am processing text files with it as follows: 我用它处理文本文件如下：

Input: 输入：

00000001;Root;;
00000002;  Documents;;
00000003;    oracle-advanced_plsql.zip;file;
00000004;  Public;;
00000005;  backup;;
00000006;    20110323-JM-F.7z.001;file;
00000007;    20110426-JM-F.7z.001;file;
00000008;    20110603-JM-F.7z.001;file;
00000009;    20110701-JM-F-via-summer_school;;
00000010;      20110701-JM-F-yyy.7z.001;file;

Desired output: 期望的输出：

00000001;;Root;;
00000002;  ;Documents;;
00000003;    ;oracle-advanced_plsql.zip;file;
00000004;  ;Public;;
00000005;  ;backup;;
00000006;    ;20110323-JM-F.7z.001;file;
00000007;    ;20110426-JM-F.7z.001;file;
00000008;    ;20110603-JM-F.7z.001;file;
00000009;    ;20110701-JM-F-via-summer_school;;
00000010;      ;20110701-JM-F-yyy.7z.001;file;

Here is the working Perl code: 这是工作的Perl代码：

#filename: perl_regex.pl
#/usr/bin/perl -w
while(<>) {                                                           
  s/^(.*?;.*?)(\w)/$1;$2/;                                            
  print $_;                                                           
}

It call it from the command line: perl_regex.pl input.txt 它从命令行调用它： perl_regex.pl input.txt

Explanation of the Perl-style regex: Perl风格的正则表达式的解释：

s/        # start search-and-replace regexp
  ^       # start at the beginning of this line
  (       # save the matched characters until ')' in $1
    .*?;  # go forward until finding the first semicolon
    .*?   # go forward until finding... (to be continued below)
  )
  (       # save the matched characters until ')' in $2
    \w    # ... the next alphanumeric character.
  )
/         # continue with the replace part
  $1;$2   # write all characters found above, but insert a ; before $2
/         # finish the search-and-replace regexp.

Could anyone tell me, how to get the same result in Python? 谁能告诉我，如何在Python中获得相同的结果？ Especially for the $1 and $2 variables I couldn't find something alike. 特别是对于1美元和2美元的变量，我找不到类似的东西。

Answer 1

The replace instruction for s/pattern/replace/ in python regexes is the re.sub(pattern, replace, string) function, or re.compile(pattern).sub(replace, string). s / pattern / replace /在python正则表达式中的替换指令是re.sub（pattern，replace，string）函数，或re.compile（pattern）.sub（replace，string）。 In your case, you will do it so: 在您的情况下，您将这样做：

_re_pattern = re.compile(r"^(.*?;.*?)(\w)")
result = _re_pattern.sub(r"\1;\2", line)

Note that $1 becomes \\1 . 请注意， $1变为\\1 。 As for perl, you need to iterate over your lines the way you want to do it (open, inputfile, splitlines, ...). 至于perl，你需要以你想要的方式迭代你的行（open，inputfile，splitlines，...）。

Answer 2

Python regular expression is very similar to Perl's, except: Python正则表达式与Perl非常相似，除了：

In Python there's no regular expression literal. 在Python中，没有正则表达式文字。 It should be expressed using string. 它应该用字符串表示。 I used r'raw string literal' in the following code. 我在下面的代码中使用了r'raw string literal' 。
Backreferences are expressed as \\1 , \\2 , .. or \\g<1> , \\g<2> , .. 反向引用表示为\\1 ， \\2 ，..或\\g<1> ， \\g<2> ，..
... ...

Use re.sub to replace. 使用re.sub替换。

import re
import sys

for line in sys.stdin: # Explicitly iterate standard input line by line
    # `line` contains trailing newline!
    line = re.sub(r'^(.*?;.*?)(\w)', r'\1;\2', line)
    #print(line) # This print trailing newline
    sys.stdout.write(line) # Print the replaced string back.

从Perl到Python的正则表达式转换

问题描述

2 个解决方案

解决方案1
2 2014-01-30 14:52:15

解决方案2
1 已采纳 2014-01-30 14:52:23

从Perl到Python的正则表达式转换

问题描述

2 个解决方案

解决方案1 2 2014-01-30 14:52:15

解决方案2 1 已采纳 2014-01-30 14:52:23

解决方案1
2 2014-01-30 14:52:15

解决方案2
1 已采纳 2014-01-30 14:52:23