简体   繁体   English

正则表达式查找引号内的句子

[英]Regex to find sentences inside quotation marks

I have a very long text where some of the lines don't have quotation marks, some other lines are surrounded by double quotation marks and some others are only partially surrounded by quotation marks.我有一个很长的文本,其中一些行没有引号,其他一些行用双引号括起来,而其他一些行只部分用引号括起来。

Here's an excerpt (each line is an example of the above cases):这是一段摘录(每一行都是上述情况的一个例子):

#Example 1 (line with no quotation marks)
I thought all this over for two or three days, and then I reckoned I would see if there was anything in it. 

#Example 2 (full line inside quotation marks)
"Why, my boy, you are all out of breath.  Did you come for your interest?" 

#Example 3 (only part of the line inside quotation marks)
"No, sir," I says, "I don't want to spend it. 

I'm trying to find a regular expression that would find all those lines that:我正在尝试找到一个正则表达式来找到所有这些行:

  • Start with a newline从换行开始
  • Have double quotation marks at the beginning开头有双引号
  • Have double quotation marks at the end末尾有双引号

In other words, lines that follow the second example above.换句话说,遵循上面第二个示例的行。 I've tried the following:我尝试了以下方法:

import re

def my_pattern():
  pattern = r'^\"(.+)\"$'
  return re.compile(pattern, re.M | re.IGNORECASE)

But I don't get the output I want.但我没有得到我想要的 output。 Any ideas on how I could improve my regex?关于如何改进我的正则表达式的任何想法?

Instead of matching everything in-between.而不是匹配中间的所有内容。 Try only matching non- " and non- \n characters.尝试仅匹配非"和非\n字符。

^"([^"\n]+)"$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM