简体   繁体   English

如何使用Python或正则表达式在引号中获得对话线?

[英]How can I get lines of dialogue in quotes using Python or regex?

I've tried a few of the answers on this site with no luck. 我在这个网站上尝试了一些没有运气的答案。 Here's an example of the kind of text I'm working with: 这是我正在使用的文本类型的示例:

"But if you have got them to-day," said Elizabeth, "my mother's purpose will be answered." “但如果你今天得到它们,”伊丽莎白说,“我母亲的目的将得到回应。”

She did at last extort from her father an acknowledgment that the horses were engaged. 她最后敲了敲父亲的话,承认马匹订婚了。 Jane was therefore obliged to go on horseback, and her mother attended her to the door with many cheerful prognostics of a bad day. 因此,简不得不骑马,她的母亲带着许多令人愉快的预言来到她家门口。 Her hopes were answered; 她的希望得到了回应; Jane had not been gone long before it rained hard. 在雨下雨之前,简没多久就离开了。 Her sisters were uneasy for her, but her mother was delighted. 她的姐妹们对她感到不安,但她的母亲很高兴。 The rain continued the whole evening without intermission; 整个晚上的雨没有中场休息; Jane certainly could not come back. 简当然无法回来。

"This was a lucky idea of mine, indeed!" “这确实是我的幸运想法!” said Mrs. Bennet more than once, as if the credit of making it rain were all her own. 班纳特太太不止一次地说,好像下雨的功劳都是她自己的。 Till the next morning, however, she was not aware of all the felicity of her contrivance. 然而,直到第二天早上,她才意识到她的设计的所有幸福。 Breakfast was scarcely over when a servant from Netherfield brought the following note for Elizabeth: 当尼日斐花园的一名仆人为伊丽莎白带来以下注释时,早餐几乎没有结束:

"MY DEAREST LIZZY,-- “我最开心的LIZZY, -

"I find myself very unwell this morning, which, I suppose, is to be imputed to my getting wet through yesterday. My kind friends will not hear of my returning till I am better. They insist also on my seeing Mr. Jones--therefore do not be alarmed if you should hear of his having been to me--and, excepting a sore throat and headache, there is not much the matter with me.--Yours, etc." “我发现自己今天早上非常不舒服,我想,这是因为我昨天被淋湿了。我亲切的朋友不会听到我的回归,直到我变得更好。他们也坚持看到琼斯先生 - 因此,如果你听说他曾经去过我,请不要惊慌 - 除了喉咙痛和头痛外,对我来说并不重要.--你的,等等。“

"Well, my dear," said Mr. Bennet, when Elizabeth had read the note aloud, "if your daughter should have a dangerous fit of illness--if she should die, it would be a comfort to know that it was all in pursuit of Mr. Bingley, and under your orders." “好吧,亲爱的,”班纳特先生说,当伊丽莎白大声朗读这张纸条时,“如果你的女儿患病了 - 如果她应该死,那么知道这一切都在追求彬格莱先生,并根据你的命令。“

"Oh! I am not afraid of her dying. People do not die of little trifling colds. She will be taken good care of. As long as she stays there, it is all very well. I would go and see her if I could have the carriage." “哦!我并不害怕她的死亡。人们不会因为小小的感冒而​​死。她会得到很好的照顾。只要她待在那里,一切都很好。如果可以,我会去看她有车。“

From this example I'd like to extract 从这个例子我想提取

"But if you have got them to-day, my mother's purpose will be answered"
"This was a lucky idea of mine, indeed!" 
"MY DEAREST LIZZY,-- I find myself very unwell this morning, which, I suppose, is to be imputed to my getting wet through yesterday. My kind friends will not hear of my returning till I am better. They insist also on my seeing Mr. Jones--therefore do not be alarmed if you should hear of his having been to me--and, excepting a sore throat and headache, there is not much the matter with me.--Yours, etc." 
"Well, my dear,"

... and so forth. ......等等。 The rule I'm trying to get into regex is 我试图进入正则表达式的规则是

1. get all strings within a " " (there can be multiple on the same line)
2. if the line ends with a \n before finding a second ", continue grabbing the next line so long as it also begins with a "

It might not what you are looking for, but you can try this one: RegexDemo 它可能不是你想要的,但你可以试试这个: RegexDemo

text = '''
"But if you have got them to-day," said Elizabeth, "my mother's purpose will be answered."

She did at last extort from her father an acknowledgment that the horses were engaged. Jane was therefore obliged to go on horseback, and her mother attended her to the door with many cheerful prognostics of a bad day. Her hopes were answered; Jane had not been gone long before it rained hard. Her sisters were uneasy for her, but her mother was delighted. The rain continued the whole evening without intermission; Jane certainly could not come back.

"This was a lucky idea of mine, indeed!" said Mrs. Bennet more than once, as if the credit of making it rain were all her own. Till the next morning, however, she was not aware of all the felicity of her contrivance. Breakfast was scarcely over when a servant from Netherfield brought the following note for Elizabeth:

"MY DEAREST LIZZY,--

"I find myself very unwell this morning, which, I suppose, is to be imputed to my getting wet through yesterday. My kind friends will not hear of my returning till I am better. They insist also on my seeing Mr. Jones--therefore do not be alarmed if you should hear of his having been to me--and, excepting a sore throat and headache, there is not much the matter with me.--Yours, etc."

"Well, my dear," said Mr. Bennet, when Elizabeth had read the note aloud, "if your daughter should have a dangerous fit of illness--if she should die, it would be a comfort to know that it was all in pursuit of Mr. Bingley, and under your orders."

"Oh! I am not afraid of her dying. People do not die of little trifling colds. She will be taken good care of. As long as she stays there, it is all very well. I would go and see her if I could have the carriage."
'''

talk = re.findall(r'\"([^\"]+?)(\"|\-\-\n)',text)
for t in talk:
    print(t[0])

This RegEx might help you to achieve that. 此RegEx可能会帮助您实现这一目标。 It would divide your text into three groups: 它会将您的文本分为三组:

(\")(.*)(\")

在此输入图像描述

If you wish to pass \\n , you might simply add it to the second group using an OR | 如果您希望传递\\n ,您可以使用OR |将其添加到第二组 , and update it as : ,并将其更新为

 (\")(.*|\n)(\")

在此输入图像描述

For your example data, you might use an alternation : 对于示例数据,您可以使用替换

"[^\n"]*"|"[^\n"]*\n+"[^"]*"
  • "[^\\n"]*" Match from opening till closing double quote without matching a newline "[^\\n"]*"从打开到关闭双引号匹配,不匹配换行符
  • | Or 要么
  • "[^\\n"]*\\n+"[^"]*" Match from opening quote till closing quote only when the first newline starts with a double quote "[^\\n"]*\\n+"[^"]*"仅在第一个换行符以双引号开头时,从开头报价到收盘报价匹配

Regex demo 正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Selenium-Python从对话中获取文本? - How can I get the text from a dialogue using selenium-python? 我如何在python regex中找到单引号内的字符串 - How can i find the string inside single quotes in python regex 我如何使用正则表达式python提取引号内的值? - how do i extract value inside quotes using regex python? 我如何使用python正则表达式在相同模式之间获得线条 - How do I get lines between same pattern using python regex 如何在Python3中使用正则表达式获取撇号的前缀? - How can I get the prefix of the apostrophe using regex in Python3? Python:如何使用正则表达式将句子拆分为新行,然后使用空格将标点符号与单词分开? - Python: How can I use a regex to split sentences to new lines, and then separate punctuation from words using whitespace? 如何在 python 中使用正则表达式获取字符串的前 3 行 - How to get the first 3 lines of a string using regex in python 如果表达式在单独的行中,如何获得等价的正则表达式? - How can I get the regex equivalent if the expression are in separate lines? 如何从Python 3的对话框中打开图像文件? - How can I open an image file from a dialogue box in Python 3? 正则表达式在Python中添加引号,以便我可以返回Python字典 - Regex add quotes in Python so I can return Python Dictionary
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM