简体   繁体   English

python 正则表达式:提取单引号或双引号之间的文本

[英]python regex: extracting texts between single or double quotation marks

What I want to do is to extract all values between single or double quotation marks.我想要做的是提取单引号或双引号之间的所有值。

Let's say I have the following values.假设我有以下值。

"Alice's Adventures in Wonderland 1" 
"Alice's 'Adventures' in Wonderland 1" 
"Alice's "Adventures" in Wonderland 1" 
"Alice's Adventures \nin Wonderland 1" 
'Alice's Adventures in Wonderland 1'
'Alice's "Adventures" in Wonderland 1'
'Alice's 'Adventures' in Wonderland 1'
'Alice's Adventures \tin Wonderland 1'

And the desired outputs are:所需的输出是:

Alice's Adventures in Wonderland 1
Alice's 'Adventures' in Wonderland 1
Alice's "Adventures" in Wonderland 1
Alice's Adventures \nin Wonderland 1
Alice's Adventures in Wonderland 1
Alice's "Adventures" in Wonderland 1
Alice's 'Adventures' in Wonderland 1
Alice's Adventures \tin Wonderland 1

How should I write the regex (using one regex expression for extracting all the desired values at once) to get the whole texts enclosed in the first and last quotation marks?我应该如何编写正则表达式(使用一个正则表达式来一次提取所有所需的值)以获取包含在第一个和最后一个引号中的整个文本?

ps I want to use re.search(r"...", text) method ps 我想用re.search(r"...", text)方法

(?<?[\'\"])\n)? part to include \n s coming in between the actual text. \1 towards the end to match with quotation it started with ' or " (?<?[\'\"])\n)?部分包括\n进入实际文本之间。 \1到末尾以匹配以'"开头的引号

for match in re.finditer(r'^([\'\"])(.*?(?:(?<![\'\"])\n)?.*?)\1 *$', str1, re.M):
    print(match.group(2))

Alice's Adventures in Wonderland 1
Alice's 'Adventures' in Wonderland 1
Alice's "Adventures" in Wonderland 1
Alice's Adventures 
in Wonderland 1
Alice's Adventures in Wonderland 1
Alice's "Adventures" in Wonderland 1
Alice's 'Adventures' in Wonderland 1
Alice's Adventures  in Wonderland 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM