繁体   English   中英

Python:使用正则表达式根据用“.”、“?”或“!”分隔的文本句子创建列表列表

[英]Python: Use regex to create a list of lists based on text sentences separated by “.”, “?”, or “!”

我已经清理了以下示例文本。 以下只是其中的一个示例:

and can you by no drift of circumstance get from him why he puts on this confusion grating so harshly all his days of 
quiet with turbulent and dangerous lunacy? he does confess he feels himself distracted. but from what cause he will by 
no means speak. nor do we find him forward to be sounded but with a crafty madness keeps aloof when we would bring 
him on to some confession of his true state. did he receive you well? most like a gentleman. but with much forcing of 
his disposition. niggard of question. but of our demands most free in his reply. 

我想做以下事情:

  • 创建一个名为 hamsplits 的列表列表,这样 hamsplits[i] 是文本第 i 个句子中所有单词的列表。
  • 句子应该按照它们出现的顺序存储,每个句子中的单词也应该如此
  • 句子以'.'、'?'和'!'结尾

所需的 output 示例:

hamsplits[0] == ['and', 'can', 'you', 'by', ..., 'dangerous', 'lunacy']

我只使用“。”尝试了下面的代码。 作为测试,但它不返回列表列表:

hamsplits3 = hamsplits2.split('.')

相反,它返回:

['\n\nand can you by no drift of circumstance get from him why he puts on this confusion grating so harshly all his days of \nquiet with turbulent and dangerous lunacy? he does confess he feels himself distracted', ' but from what cause he will by \nno means speak', ' nor do we find him forward to be sounded but with a crafty madness keeps aloof when we would bring \nhim on to some confession of his true state', ' did he receive you well? most like a gentleman', ' but with much forcing of \nhis disposition', ' niggard of question', ' but of our demands most free in his reply', " did you assay him? ... ]

我究竟做错了什么? 我不想在import re之外使用任何导入的包

你可以试试findall

import re

s = """and can you by no drift of circumstance get from him why he puts on this confusion grating so harshly all his days of 
quiet with turbulent and dangerous lunacy? he does confess he feels himself distracted. but from what cause he will by 
no means speak. nor do we find him forward to be sounded but with a crafty madness keeps aloof when we would bring 
him on to some confession of his true state. did he receive you well? most like a gentleman. but with much forcing of 
his disposition. niggard of question. but of our demands most free in his reply."""

hamsplits = [i.strip().replace('\n', '').split(' ') for i in re.findall(r'[^.?!]+', s, re.MULTILINE)]

print(hamplist[0])

Output:

['and', 'can', 'you', 'by', 'no', 'drift', 'of', 'circumstance', 'get', 'from', 'him', 'why', 'he', 'puts', 'on', 'this', 'confusion', 'grating', 'so', 'harshly', 'all', 'his', 'days', 'of', 'quiet', 'with', 'turbulent', 'and', 'dangerous', 'lunacy']

你可以试试这个。

import re
with open('input_text.txt') as file:
  hamsplits =[ele.split() for ele in re.split('[.?!]',file.read())]
print(hamsplits[0])

output:

['and', 'can', 'you', 'by', 'no', 'drift', 'of', 'circumstance', 'get', 'from', 'him', 'why', 'he', 'puts', 'on', 'this', 'confusion', 'grating', 'so', 'harshly', 'all', 'his', 'days', 'of', 'quiet', 'with', 'turbulent', 'and', 'dangerous', 'lunacy']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM