[英]Extract elements respectively from text
我有以下文字:
'- `Popen.``terminate`()\n\n Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n- `Popen.``kill`()\n\n Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...
我尝试从文本中提取列表
In [46]: pattern = re.compile(r'-\s(.+)\n\n')
In [49]: matches = pattern.findall(content)
In [50]: matches
Out[50]:
['`Popen.``terminate`()',
'`Popen.``kill`()',
'`Popen.``args`',
'`Popen.``stdin`',
'`Popen.``stdout`']
我想要的结果是
['Popen.terminate()',
'Popen.kill()',
'Popen.args',
'Popen.stdin',
'Popen.stdout']
我用两组来改变姿势,以捕捉合格的零件
In [55]: pattern2 = re.compile(r'- `(\w+).``(\w+.*)`')
In [64]: matches = pattern2.findall(content)
In [65]: matches
Out[65]:
[('Popen', 'terminate'),
('Popen', 'kill'),
('Popen', 'args'),
('Popen', 'stdin'),
('Popen', 'stdout')]
仍然不是我想要的结果。
如何解决问题?
-\s`([^`]*)``([^`]*)`((?:\(\))?)\n\n
import re
r = re.compile(r"-\s`([^`]*)``([^`]*)`((?:\(\))?)\n\n")
s = ("'- `Popen.``terminate`()\n\n"
" Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n"
"- `Popen.``kill`()\n\n"
" Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...\n")
for m in re.finditer(r, s):
print m.group(1) + m.group(2) + m.group(3)
'- `Popen.``terminate`()\n\n Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n- `Popen.``kill`()\n\n Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...
注意 :下面的输出与OP的预期输出不匹配,因为OP不会发布完整字符串,而只会发布部分字符串。
Popen.terminate()
Popen.kill()
-
匹配连字符-
字面上 \\s
匹配空白字符 `
从字面上匹配严重的重音字符 ([^`]*)
将任何不存在于集合中的字符(重音符`
除外的任何字符)捕获到捕获组1中 ``
从字面上匹配两个严重的重音字符 ([^`]*)
将集合中不存在的任何数量的任何字符(除了重音符`
之外的任何字符)捕获到捕获组2中 `
从字面上匹配严重的重音字符 ((?:\\(\\))?)
将以下内容捕获到捕获组3中
(?:\\(\\))?
匹配以下零或一次
\\(\\)
从字面上匹配左括号和右括号()
\\n\\n
匹配两个换行符
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.