匹配字符串中的任何单词，除了在python中以大括号开头的单词

Question

我有一个字符串

line = u'I need to match the whole line except for {thisword for example'

我这样做有困难。 我尝试了什么，它不起作用：

# in general case there will be Unicode characters in the pattern
matchobj = re.search(ur'[^\{].+', line) 

matchobj = re.search(ur'(?!\{).+', line)

你能帮我弄清楚出了什么问题，怎么做对了？

PS我认为我不需要用空字符串替换"{thisword"

Answer 1

我不太清楚你需要什么。 从你的问题标题它看起来你想找到“字符串中的所有单词，例如'行'，这些不是以{”开头，但你使用的是re.search（）函数让我感到困惑。

`re.search()`和`re.findall()`

功能re.search()返回一个对应的MatchObject实例，re.serach通常用于匹配并在一个长字符串返回一个图案。 它不会返回所有可能的匹配项 。 见下面一个简单的例子：

>>> re.search('a', 'aaa').group(0) # only first match
'a'
>>> re.search('a', 'aaa').group(1) # there is no second matched
Traceback (most recent call last):
  File "<console>", line 1, in <module>
IndexError: no such group

使用正则表达式'a'搜索只返回字符串'aaa' 一个模式'a' 'aaa' ，它不会返回所有可能的匹配项。

如果您的目标是找到 - “字符串中的所有单词都不以{ ”开头。 你应该使用re.findall()函数： - 匹配模式的所有出现，而不是像re.search（）那样匹配第一个模式。 见例子：

>>> re.findall('a', 'aaa')
['a', 'a', 'a']

编辑：在注释的基础上添加一个示例来演示re.search和re.findall的使用：

>>> re.search('a+', 'not itnot baaal laaaaaaall ').group()
'aaa'                 # returns ^^^   ^^^^^ doesn't 
>>> re.findall('a+', 'not itnot baaal laaaaaaall ')
['aaa', 'aaaaaaa']    #          ^^^   ^^^^^^^ match both

这是Python re模块的一个很好的教程： re - 正则表达式

另外，Python-regex中有group的概念 - “括号内的匹配模式”。 如果正则表达式模式中存在多个组，则re.findall（）返回组列表; 如果模式有多个组，这将是一个元组列表。 见下文：

>>> re.findall('(a(b))', 'abab') # 2 groups according to 2 pair of ( )
[('ab', 'b'), ('ab', 'b')] # list of tuples of groups captured

在Python中，正则表达式(a(b))包含两个组; 作为两对括号（这与形式语言中的正则表达式不同 - 正则表达式与正式语言中的正则表达式不完全相同，但这是不同的事情）。

答案：句子line中的单词用空格分隔（其他在字符串的开头）正则表达式应该是：

ur"(^|\s)(\w+)

正则表达式描述：

(^|\\s+)表示：在开始时或在某些空格后开始的单词。
\\w* ：匹配字母数字字符，包括“_”。

在将regex r应用于您的行时：

>>> import pprint    # for pretty-print, you can ignore thesis two lines
>>> pp = pprint.PrettyPrinter(indent=4)

>>> r = ur"(^|\s)(\w+)"
>>> L = re.findall(r, line)
>>> pp.pprint(L)
[   (u'', u'I'),
    (u' ', u'need'),
    (u' ', u'to'),
    (u' ', u'match'),
    (u' ', u'the'),
    (u' ', u'whole'),
    (u' ', u'line'),
    (u' ', u'except'),
    (u' ', u'for'),   # notice 'for' after 'for'
    (u' ', u'for'),   # '{thisword' is not included
    (u' ', u'example')]
>>>

要在一行中查找所有单词，请使用：

>>> [t[1] for t in re.findall(r, line)]

注意：它会避免{或来自行的任何其他特殊字符，因为\\ w只传递字母数字和'_'字符。

如果你特别避免{如果它出现在一个单词的开头（在允许的中间），那么使用正则表达式： r = ur"(^|\\s+)(?P<word>[^{]\\S*)" 。

要理解这个正则表达式与其他正则表达式之间的差异，请检查以下示例：

>>> r = ur"(^|\s+)(?P<word>[^{]\S*)"
>>> [t[1] for t in re.findall(r, "I am {not yes{ what")]
['I', 'am', 'yes{', 'what']

没有正则表达式：

你可以在没有任何正则表达式的情况下完成同样的事情如下：

>>> [w for w in line.split() if w[0] != '{']

re.sub（）替换模式

如果你只想用{替换一个（或多个）单词开头{你应该使用re.sub()替换模式以{ by emplty string ""开头""检查以下代码：

>>> r = ur"{\w+"
>>> re.findall(r, line)
[u'{thisword']
>>> re.sub(r, "", line)
u'I need to match the whole line except for  for example'

编辑添加评论的回复：

(?P<name>...)是Python的Regex扩展:(它在Python中有意义） - (?P<name>...)类似于常规括号 - 创建一个组（一个命名组）。 可以通过符号组名称访问该组。 组名必须是有效的Python标识符，并且每个组名只能在正则表达式中定义一次。 例如-1：

>>> r = "(?P<capture_all_A>A+)"
>>> mo = re.search(r, "aaaAAAAAAbbbaaaaa")
>>> mo.group('capture_all_A')
'AAAAAA'

example-2：假设你想从名称行中过滤名称，例如mr使用正则表达式： name_re = "(?P<title>(mr|ms)\\.?)? ?(?P<name>[az ]*)"

我们可以使用group('name')读取输入字符串中group('name') ：

>>> re.search(name_re, "mr grijesh chauhan").group('name')
'grijesh chauhan'
>>> re.search(name_re, "grijesh chauhan").group('name')
'grijesh chauhan'
>>> re.search(name_re, "ms. xyz").group('name')
'xyz'

Answer 2

你可以简单地做：

(?<!{)(\\b\\w+\\b)启用了g标志（所有匹配项）

演示： http ： //regex101.com/r/zA0sL6

Answer 3

试试这种模式：

(.*)(?:\{\w+)\s(.*)

码：

import re
p = re.compile(r'(.*)(?:\{\w+)\s(.*)')
str = "I need to match the whole line except for {thisword for example"

p.match(str)

例：

http://regex101.com/r/wR8eP6

匹配字符串中的任何单词，除了在python中以大括号开头的单词

问题描述

3 个解决方案

解决方案1
2 已采纳 2014-04-12 14:03:57

`re.search()`和`re.findall()`

解决方案2
1 2014-04-12 12:27:54

解决方案3
0 2014-04-12 12:34:51

匹配字符串中的任何单词，除了在python中以大括号开头的单词

问题描述

3 个解决方案

解决方案1 2 已采纳 2014-04-12 14:03:57

re.search()和re.findall()

解决方案2 1 2014-04-12 12:27:54

解决方案3 0 2014-04-12 12:34:51

解决方案1
2 已采纳 2014-04-12 14:03:57

`re.search()`和`re.findall()`

解决方案2
1 2014-04-12 12:27:54

解决方案3
0 2014-04-12 12:34:51