简体   繁体   English

如何处理Python正则表达式匹配中的多行模式

[英]How to handle the multiple-line pattern in Python regular expression matching

I have a list of company names to be replaced by the word 'company'. 我有一个公司名称列表,将被“公司”一词取代。 The list across multiple lines. 跨多行的列表。

cmp=re.compile(""" A | B |
                   C | D
               """)
text='A is a great company, so is B'
cmp.sub('company',text)

But it doesn't work. 但它不起作用。 How should I fix this? 我该怎么解决这个问题?

Edit: 编辑:

The above example given didn't consider the whitespace in company names. 上面给出的例子没有考虑公司名称中的空格。

company1=re.compile(r"""Berkshire Hathaway|Australia & New Zealand Bank
                  |Wells Fargo|AIG
                  |Ind & Comm Bank of China|BNP Paribas""")
company2=re.compile(r"""Berkshire Hathaway|Australia & New Zealand Bank
                  |Wells Fargo|AIG
                  |Ind & Comm Bank of China|BNP Paribas""",re.VERBOSE)
text='AIG is a great company, so is Berkshire Hathaway'  
company1.sub('cmp',text) 
>>> 'AIG is a great company, so is cmp'
company2.sub('cmp',text) 
>>> 'cmp is a great company, so is Berkshire Hathaway'

You could treat this as an example of a verbose pattern which allows (and ignores) whitespace like line breaks: 您可以将此视为一个冗长模式的示例,它允许(并忽略)像换行符这样的空格:

import re

cmp = re.compile(r""" A | B |
                   C | D
               """, re.VERBOSE)
text = 'A is a great company, so is B'
print(cmp.sub('company', text))

OUTPUT OUTPUT

company is a great company, so is company

Space is contained in the company names. 空间包含在公司名称中。 ... Any idea on how to fix this? ...关于如何解决这个问题的任何想法?

We need to do something like a CGI escape of the space characters that appear inside of names. 我们需要做一些像名称中出现的空格字符的CGI转义。 Here's a regex-based approach that doesn't require decoding of the encoded spaces: 这是一种基于正则表达式的方法,不需要解码编码空间:

import re

companies = re.compile(re.sub(r"(?<=\S) (?=\S)", r"[ ]", """Berkshire Hathaway|Australia & New Zealand Bank
                  |Wells Fargo|AIG
                  |Ind & Comm Bank of China|BNP Paribas"""), re.VERBOSE)

text = 'AIG is a great company, so is Berkshire Hathaway'

print(companies.sub('cmp', text))

OUTPUT OUTPUT

cmp is a great company, so is cmp

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM