简体   繁体   English

使用re.findall()替换所有匹配项

[英]Replace all matches using re.findall()

Using re.findall() I've managed to get return multiple matches of a regex in a string. 使用re.findall()我设法在字符串中返回正则表达式的多个匹配项。 However my object returned is a list of matches within the string. 但是我返回的对象是字符串中的匹配列表。 This is not what I want. 这不是我想要的。

What I want is to replace all matches with something else. 我想要的是用其他东西替换所有匹配。 I've tried to use similar syntax as you would use in re.sub to do this as so: 我尝试使用类似于在re.sub中使用的类似语法来执行此操作:

import json
import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)

filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt"

f = open(filepath, 'r')
myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read())
print myfile

However, this creates the following error: 但是,这会产生以下错误:

Traceback (most recent call last):
  File "C:/Python27/Customer Stuff/Austin's Script.py", line 9, in <module>
    myfile = re.findall(regex, '([a-zA-Z]\%[a-zA-Z])', f.read())
  File "C:\Python27\lib\re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
  File "C:\Python27\lib\re.py", line 229, in _compile
    bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'

Can anyone assist me within the last bit of syntax I need to replace all matches with something else within the original Python object? 任何人都可以帮助我在最后一点语法中我需要用原始Python对象中的其他东西替换所有匹配吗?

EDIT: 编辑:

In line with comments and answers received, here is me trying to sub one regex with another: 根据收到的评论和答案,这里是我试图将一个正则表达式与另一个正则表达式:

import json
import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
regex2 = re.compile('([a-zA-Z]%[a-zA-Z])', re.S)

filepath = "C:\\Python27\\Customer Stuff\\Austin Tweets.txt"

f = open(filepath, 'r')
myfile = f.read()
myfile2 = re.sub(regex, regex2, myfile)
print myfile

This now produces the following error: 这会产生以下错误:

Traceback (most recent call last):
  File "C:/Python27/Customer Stuff/Austin's Script.py", line 11, in <module>
    myfile2 = re.sub(regex, regex2, myfile)
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python27\lib\re.py", line 273, in _subx
    template = _compile_repl(template, pattern)
  File "C:\Python27\lib\re.py", line 258, in _compile_repl
    p = sre_parse.parse_template(repl, pattern)
  File "C:\Python27\lib\sre_parse.py", line 706, in parse_template
    s = Tokenizer(source)
  File "C:\Python27\lib\sre_parse.py", line 181, in __init__
    self.__next()
  File "C:\Python27\lib\sre_parse.py", line 183, in __next
    if self.index >= len(self.string):
TypeError: object of type '_sre.SRE_Pattern' has no len()
import re

regex = re.compile('([a-zA-Z]\"[a-zA-Z])', re.S)
myfile =  'foo"s bar'
myfile2 = regex.sub(lambda m: m.group().replace('"',"%",1), myfile)
print(myfile2)

If I understand your question correctly, you're trying to replace a quotation mark between two characters with an percent sign between those characters. 如果我正确理解了您的问题,那么您尝试使用这些字符之间的百分号替换两个字符之间的引号。

There are several ways to do this with re.sub ( re.findall doesn't do replacements at all, so your initial attemps were always doomed to fail). 使用re.sub有几种方法可以做到这一点( re.findall根本不做替换,所以你的初始尝试总是注定要失败)。

An easy approach would be to change your pattern to group the letters separately, and then use a replacement string that includes backreferences: 一种简单的方法是更改​​模式以单独对字母进行分组,然后使用包含反向引用的替换字符串:

pattern = re.compile('([a-zA-Z])\"([a-zA-Z])', re.S)
re.sub(pattern, r'\1%\2', text)

Another option would be to use a replacement function instead of a replacement string. 另一种选择是使用替换函数而不是替换字符串。 The function will be called with a match object for each match found in the text, and its return value is the replacement: 对于文本中找到的每个匹配,将使用match对象调用该函数,其返回值为替换:

pattern = re.compile('[a-zA-Z]\"[a-zA-Z]', re.S)
re.sub(pattern, lambda match: "{0}%{2}".format(*match.group()), text)

(There are probably lots of other ways of implementing the lambda function. I like string formatting.) (可能有很多其他方法来实现lambda函数。我喜欢字符串格式化。)

However, probably the best approach is to use a lookahead and a lookbehind in your pattern to make sure your quotation mark is between letters without actually matching those letters. 但是,最好的方法可能是在模式中使用前瞻和后瞻,以确保引号位于字母之间而不实际匹配这些字母。 This lets you use the trivial string '%' as the replacement: 这允许您使用普通字符串'%'作为替换:

pattern = re.compile('(?<=[a-zA-Z])\"(?=[a-zA-Z])', re.S)
re.sub(pattern, '%', text)

This does have very slightly different semantics than the other versions. 这与其他版本的语义略有不同。 A text like 'a"b"c' will have both quotation marks replaced, while the previous codes would only replace the first one. 'a"b"c'这样'a"b"c'文本将替换两个引号,而之前的代码只会替换第一个。 Hopefully this is an improvement! 希望这是一个改进!

As suggested in comment, use re.sub() : 正如评论中所建议的那样,使用re.sub()

myfile = re.sub(regex, replacement, f.read())

where, replacement is the string your matches will be substituted with. 其中,replacement是您的匹配将被替换的字符串。

I find it clearer to use a function to do this type of substitution rather than a lambda. 我发现使用函数来做这种替换而不是lambda更清楚。 It makes it easy to perform any number of transformations on the matched text prior to replacing the text: 在替换文本之前,可以轻松地对匹配的文本执行任意数量的转换:

import re

def replace_double_quote(match):
    text = match.group()
    return text.replace('"', '%')

regex = re.compile('([a-zA-Z]\"[a-zA-Z])')
myfile = 'foo"s bar and bar"s foo'
regex.sub(replace_double_quote, myfile)

This returns foo%s bar and bar%s foo . 这将返回foo%s bar and bar%s foo Note that it replaces all matches. 请注意,它取代了所有匹配项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM