为什么此python正则表达式返回错误的字符串

Question

Below I have a piece of code that should replace one string with another but doesnt seem to do it. 下面我有一段代码应将一个字符串替换为另一个字符串，但似乎不这样做。 I am not a python or regular expression expert, can anyone tell me why this might be going wrong. 我不是python或正则表达式专家，谁能告诉我为什么这可能会出错。

def ReplaceCRC( file_path ):
    file = open(file_path,'r+');
    file_str = file.read()
    if( file_str <> '' ):
         crc_list        = re.findall(r'_CalcCRC[(]\s*"\w+"\s*[)]', file_str);
         strs_to_crc     = []
         new_crc_list    = []
         if( crc_list ):
              for crc in crc_list:
                   quote_to_crc    = re.search(r'"\w+"', crc);
                   str_to_crc      = re.search(r'\w+', quote_to_crc.group() ).group();
                   final           = hex(CalcCRC( str_to_crc ))[:2]
                   value           = '%08X' % CalcCRC( str_to_crc )
                   final           = final + value.upper()
                   final_crc       = Insert( crc, ', ' + final + ' ', -1)
                   new_crc_list.append( final_crc )
              if( new_crc_list <> [] ):
                   for i in range(len(crc_list)):
                       print crc_list[i]
                       print new_crc_list[i]
                       term = re.compile( crc_list[i] );
                       print term.sub( new_crc_list[i], file_str );

This is the file it is operating on: 这是它正在处理的文件：

printf( "0x%08X\n", _CalcCRC("THIS_IS_A_CRC") );
printf( "0x%08X\n", _CalcCRC("PATIENT_ZERO") );

This is the output 这是输出

_CalcCRC("THIS_IS_A_CRC")
_CalcCRC("THIS_IS_A_CRC", 0x97DFEAC9 )
printf( "0x%08X\n", _CalcCRC("THIS_IS_A_CRC") );
printf( "0x%08X\n", _CalcCRC("PATIENT_ZERO") );

_CalcCRC("PATIENT_ZERO")
_CalcCRC("PATIENT_ZERO", 0x0D691C21 )
printf( "0x%08X\n", _CalcCRC("THIS_IS_A_CRC") );
printf( "0x%08X\n", _CalcCRC("PATIENT_ZERO") );

What it should do is find the CRC string, calculate the value and then put a string in its place in the original string. 它应该做的是找到CRC字符串，计算值，然后将一个字符串放在原始字符串中。 I have been trying a bunch of stuff, but nothing seems to work. 我一直在尝试很多东西，但似乎没有任何效果。

Answer 1

Not your problem, but these 3 lines are amazing: 这不是您的问题，但是以下3行令人惊叹：

final           = hex(CalcCRC( str_to_crc ))[:2]
value           = '%08X' % CalcCRC( str_to_crc )
final           = final + value.upper()

Assuming CalcCRC returns a non-negative integer (eg 12345567890 假设CalcCRC返回一个非负整数（例如12345567890

Line 1 sets final to "0x" irrespective of the input! 第一行将final设置为“ 0x”，与输入无关！

>>> hex(1234567890)
'0x499602d2'
>>> hex(1234567890)[:2]
'0x'

Line 2 repeats the call to CalcCRC! 第2行重复对CalcCRC的调用！

>>> value           = '%08X' % 1234567890
>>> value
'499602D2'

Note that value is already uppercase! 请注意，该value 已经是大写了！

and after line 3, final becomes '0x499602D2' 在第3行之后， final变为'0x499602D2'

As value is not used again, the whole thing can be replaced by 由于不再使用value ，因此整个事情可以替换为

final = '0x%08X' % CalcCRC(str_to_crc)

More from Circumlocution City 环割城市的更多内容

These lines: 这些行：

quote_to_crc    = re.search(r'"\w+"', crc);
str_to_crc      = re.search(r'\w+', quote_to_crc.group() ).group();

can be replaced by one of: 可以替换为以下之一：

str_to_crc = re.search(r'"\\w+"', crc).group()[1:-1] str_to_crc = re.search(r'"(\\w+)"', crc).group(1) str_to_crc = re.search（r'“ \\ w +”'，crc）.group（）[1：-1] str_to_crc = re.search（r'“（（ww +）”'，crc）.group（1）

Answer 2

A quick peek at the real answer: 快速浏览一下真正的答案：

You need (inter alia) to use re.escape() .... 您（除其他外）需要使用re.escape（）...。

term = re.compile(re.escape(crc_list[i]))

and the indentation on your last if looks stuffed. 和最后一个的缩进（ if看起来已塞满）。

... more after dinner :-) 晚餐后更多:-)

Post-prandial update 餐后更新

You make 3 passes over the whole file, when only one will do the trick. 您将在整个文件上进行3次传递，而只有一次才能完成。 Apart from cutting out an enormous lot of clutter, the main innovation is to use the re.sub functionality that allows the replacement to be a function instead of a string. 除了消除大量混乱之外，主要的创新是使用re.sub功能，该功能允许替换成为函数而不是字符串。

import re
import zlib

def CalcCRC(s):
    # This is an example. It doesn't produce the same CRC as your examples do.
    return zlib.crc32(s) & 0xffffffff

def repl_func(mobj):
    str_to_crc = mobj.group(2)
    print "str_to_crc:", repr(str_to_crc)
    crc = CalcCRC(str_to_crc)
    # If my guess about Insert(s1, s2, n) was wrong,
    # adjust the ollowing statement.
    return '%s"%s", 0x%08X%s' % (mobj.group(1), mobj.group(2), crc, mobj.group(3))

def ReplaceCRC(file_handle):
    regex = re.compile(r'(_CalcCRC[(]\s*)"(\w+)"(\s*[)])')
    for line in file_handle:
        print "line:", repr(line)
        line2 = regex.sub(repl_func, line)
        print "line2:", repr(line2)
    return

if __name__ == "__main__":
    import sys, cStringIO
    args = sys.argv[1:]
    if args:
        f = open(args[0], 'r')
    else:
        f = cStringIO.StringIO(r"""
printf( "0x%08X\n", _CalcCRC("THIS_IS_A_CRC") )
other_stuff()
printf( "0x%08X\n", _CalcCRC("PATIENT_ZERO") )
""")
    ReplaceCRC(f)

Result of running script with no args: 没有参数运行脚本的结果：

line: '\n'
line2: '\n'
line: 'printf( "0x%08X\\n", _CalcCRC("THIS_IS_A_CRC") )\n'
str_to_crc: 'THIS_IS_A_CRC'
line2: 'printf( "0x%08X\\n", _CalcCRC("THIS_IS_A_CRC", 0x98ABAC4B) )\n'
line: 'other_stuff()\n'
line2: 'other_stuff()\n'
line: 'printf( "0x%08X\\n", _CalcCRC("PATIENT_ZERO") )\n'
str_to_crc: 'PATIENT_ZERO'
line2: 'printf( "0x%08X\\n", _CalcCRC("PATIENT_ZERO", 0x76BCDA4E) )\n'

Answer 3

Is this want you want ? 这是你想要的吗？ : ：

import re

def ripl(mat):
    return '%s, 0x%08X' % (mat.group(1),CalcCRC(mat.group(2)))

regx = re.compile(r'(_CalcCRC[(]\s*"(\w+)"\s*[)])')


def ReplaceCRC( file_path, regx = regx, ripl = ripl ):
    with open(file_path,'r+') as f:
        file_str = f.read()
        print file_str,'\n'
        if file_str:
             file_str = regx.sub(ripl,file_str)
             print file_str
             f.seek(0,0)
             f.write(file_str) 
             f.truncate()

EDIT 编辑

I had forgot the instruction f.truncate() , very important, otherwise it remains a tail if the rewritten content is shorter than the initial content 我已经忘记了指令f.truncate() ，它非常重要，否则，如果重写的内容比初始内容短，它仍然是一条尾巴。

. 。

EDIT 2 编辑2

John Machin, 约翰·马钦（John Machin）

There is no mistake, my above solution is right, it gives 没有错误，我上面的解决方案是正确的，它给出了

printf( "0x%08X\n", _CalcCRC("THIS_IS_A_CRC"), 0x97DFEAC9 ); 
printf( "0x%08X\n", _CalcCRC("PATIENT_ZERO"), 0x0D691C21 );

I hadn't changed it since your comment. 自您发表评论以来，我没有更改过它。 I think that I first posted a solution that was incorrect (because I performed some various tests to verify some behaviors and, you know, I sometimes do mix-up with my files and codes), then you copied this incorrect code to try it, then I realized that there was a mistake and corrected the code, and then you posted your comment without noticing I had corrected. 我认为我首先发布了一个不正确的解决方案（因为我执行了各种测试来验证某些行为，并且，我有时会混淆我的文件和代码），然后您复制了此错误的代码进行尝试，然后我意识到有一个错误并更正了代码，然后您在未注意到我已更正的情况下发布了评论。 I imagine no other cause of such a confusion. 我想没有其他原因会造成这种混乱。

By the way, to obtain the same result, there's even no need of two groups in the pattern defining regx , one alone is sufficient. 顺便说一句，要获得相同的结果，在定义regx的模式中甚至不需要两组，一个组就足够了。 These following regx and ripl() work as well: 以下这些regx和ripl()工作：

regx = re.compile(r'_CalcCRC\(\s*"(\w+)"\s*\)')
# I prefer '\(' to '[(]', and same for '\)' instead of '[)]'

def ripl(mat):
    return '%s, 0x%08X' % (mat.group(),CalcCRC(mat.group(1)))

But an uncertainty remains. 但是仍然存在不确定性。 Each of our result is wise, relativelay to the inaccurate wording of Joe. 我们的每个结果都是明智的，相对于乔的措词不准确。 So, what does he want as precise result ? 那么，他想要什么作为精确结果？ : must the value 0x97DFEAC9 be inserted in CalcCRC("THIS_IS_A_CRC") as in your result, or after CalcCRC("THIS_IS_A_CRC") as in mine ? ：是否必须像在结果中一样将值0x97DFEAC9插入CalcCRC("THIS_IS_A_CRC")中，或者像我的一样在CalcCRC("THIS_IS_A_CRC")之后？

To say all, I did like you to obtain a code that could be run: I defined a function CalcCRC() of my own consisting simply in if x=="THIS_IS_A_CRC": return 0x97DFEAC9 and if x=="PATIENT_ZERO": return 0x0D691C21 ; 总而言之，我确实希望您获得可以运行的代码：我定义了自己的函数CalcCRC（） ，简单地包括： if x=="THIS_IS_A_CRC": return 0x97DFEAC9 ， if x=="PATIENT_ZERO": return 0x0D691C21 ; I picked these associations out by seeing the results desired by Joe exposed in his question. 我通过查看Joe在他的问题中期望的结果来挑选出这些关联。

Now , concerning your nasty affirmation that my "point about redefinition of functions is utter nonsense" , I think I didn't explain enough what I mean. 现在，关于您对我的“关于功能的重新定义的观点完全是胡说八道”的肯定，我认为我的意思还不够。 Putting the regex regx and the function ripl() as default arguments to the parameters of the function ReplaceCRC() has a consequence : the objects regx and ripl() are created only one time, at the moment the definition of function ReplaceCRC() is executed. 把正则表达式至REGx和功能RIPL（）的默认参数的功能ReplaceCRC（）的参数有一个结果：对象至REGx和RIPL（）只创建一次，此刻功能ReplaceCRC的定义（）是被执行。 So, in case that ReplaceCRC() will be applied several times in an execution, there will be no re-creation of these objects. 因此，如果在执行过程中多次应用ReplaceCRC（） ，则不会重新创建这些对象。 I don't know if the function ReplaceCRC() is really called several times during the execution of Joe's program, but I think it's a good practice to put this feature in a code in case it may be useful. 我不知道在Joe的程序执行过程中是否多次调用过ReplaceCRC（）函数，但是我认为，最好将此功能放在代码中，以防可能有用。 Maybe, I should have underlined this point in my answer instead of a comment to justify my code relatively to yours. 也许，我应该在答案中强调这一点，而不要添加注释以使我的代码相对于您的代码合理。 But I try to limit my tendency to write sometimes answers long too much. 但是我试图限制我有时写答案的时间太长。

Are the points clarified and your annoyance soothed by these explanations ? 这些解释是否澄清了这些要点，并减轻了您的烦恼？

为什么此python正则表达式返回错误的字符串

问题描述

3 个解决方案

解决方案1
1 2011-04-22 08:46:49

解决方案2
0 2011-04-22 09:18:23

解决方案3
0 2011-04-22 11:27:10

EDIT 编辑

EDIT 2 编辑2

为什么此python正则表达式返回错误的字符串

问题描述

3 个解决方案

解决方案1 1 2011-04-22 08:46:49

解决方案2 0 2011-04-22 09:18:23

解决方案3 0 2011-04-22 11:27:10

EDIT 编辑

EDIT 2 编辑2

解决方案1
1 2011-04-22 08:46:49

解决方案2
0 2011-04-22 09:18:23

解决方案3
0 2011-04-22 11:27:10