简体   繁体   English

Python正则表达式匹配引号之间的文本

[英]Python regex match text between quotes

In the following script I would like to pull out text between the double quotes ("). However, the python interpreter is not happy and I can't figure out why...在下面的脚本中,我想在双引号 (") 之间提取文本。但是,python 解释器不高兴,我不知道为什么......

import re

text = 'Hello, "find.me-_/\\" please help with python regex'
pattern = r'"([A-Za-z0-9_\./\\-]*)"'
m = re.match(pattern, text)

print m.group()

The output should be find.me-/\ .输出应该是find.me-/\

match starts searching from the beginning of the text. match从文本的开头开始搜索。

Use search instead:改用search

#!/usr/bin/env python

import re

text = 'Hello, "find.me-_/\\" please help with python regex'
pattern = r'"([A-Za-z0-9_\./\\-]*)"'
m = re.search(pattern, text)

print m.group()

match and search return None when they fail to match. matchsearch在匹配失败时返回None

I guess you are getting AttributeError: 'NoneType' object has no attribute 'group' from python: This is because you are assuming you will match without checking the return from re.match .我猜你得到AttributeError: 'NoneType' object has no attribute 'group' from python: 这是因为你假设你会匹配而不检查re.match的返回。

Split the text on quotes and take every other element starting with the second element:拆分引号中的文本并从第二个元素开始获取所有其他元素:

def text_between_quotes(text):
    return text.split('"')[1::2]

my_string = 'Hello, "find.me-_/\\" please help and "this quote" here'
my_string.split('"')[1::2]           # ['find.me-_/\\', 'this quote']
'"just one quote"'.split('"')[1::2]  # ['just one quote']

This assumes you don't have quotes within quotes, and your text doesn't mix quotes or use other quoting characters like ` .这假设您在引号中没有引号,并且您的文本没有混合引号或使用其他引号字符,例如`

You should validate your input.您应该验证您的输入。 For example, what do you want to do if there's an odd number of quotes, meaning not all the quotes are balanced?例如,如果有奇数个引号,你想做什么,这意味着不是所有的引号都是平衡的? You could do something like discard the last item if you have an even number of things after doing the split如果您在split后有偶数件事情,您可以做一些事情,比如丢弃最后一项

def text_between_quotes(text):
    split_text = text.split('"')
    between_quotes = split_text[1::2]
    # discard the last element if the quotes are unbalanced
    if len(split_text) % 2 == 0 and between_quotes and not text.endswith('"'):
        between_quotes.pop()
    return between_quotes

# ['first quote', 'second quote']
text_between_quotes('"first quote" and "second quote" and "unclosed quote')

or raise an error instead.或引发错误。

If you write:如果你写:

m = re.search(pattern, text)

match : searches at the beginning of text match : 在文本开头搜索

search : searches all the string search : 搜索所有字符串

Maybe this helps you to understand: http://docs.python.org/library/re.html#matching-vs-searching也许这可以帮助您理解: http ://docs.python.org/library/re.html#matching-vs-searching

Use re.search() instead of re.match() .使用re.search()而不是re.match() The latter will match only at the beginning of strings (like an implicit ^ ).后者将仅在字符串的开头匹配(如隐式^ )。

You need re.search() , not re.match() which is anchored to the start of your input string.您需要re.search() ,而不是锚定到输入字符串开头的re.match()

Docs here文档在这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM