繁体   English   中英

从文本分别提取元素

[英]Extract elements respectively from text

我有以下文字:

'- `Popen.``terminate`()\n\n  Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n- `Popen.``kill`()\n\n  Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...

我尝试从文本中提取列表

In [46]: pattern = re.compile(r'-\s(.+)\n\n')
In [49]: matches = pattern.findall(content)
In [50]: matches
Out[50]:
['`Popen.``terminate`()',
 '`Popen.``kill`()',
 '`Popen.``args`',
 '`Popen.``stdin`',
 '`Popen.``stdout`']

我想要的结果是

['Popen.terminate()',
 'Popen.kill()',
 'Popen.args',
 'Popen.stdin',
 'Popen.stdout']

我用两组来改变姿势,以捕捉合格的零件

In [55]: pattern2 = re.compile(r'- `(\w+).``(\w+.*)`')
In [64]: matches = pattern2.findall(content)
In [65]: matches
Out[65]:
[('Popen', 'terminate'),
 ('Popen', 'kill'),
 ('Popen', 'args'),
 ('Popen', 'stdin'),
 ('Popen', 'stdout')]

仍然不是我想要的结果。

如何解决问题?

查看正则表达式在这里使用

-\s`([^`]*)``([^`]*)`((?:\(\))?)\n\n

用法

在这里查看正在使用的代码

import re

r = re.compile(r"-\s`([^`]*)``([^`]*)`((?:\(\))?)\n\n")

s = ("'- `Popen.``terminate`()\n\n"
    "  Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n"
    "- `Popen.``kill`()\n\n"
    "  Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...\n")

for m in re.finditer(r, s):
    print m.group(1) + m.group(2) + m.group(3)

结果

输入项

'- `Popen.``terminate`()\n\n  Stop the child. On Posix OSs the method sends SIGTERM to the child. On Windows the Win32 API function `TerminateProcess()` is called to stop the child.\n\n\n- `Popen.``kill`()\n\n  Kills the child. On Posix OSs the function sends SIGKILL to the child. On Windows;...

输出量

注意 :下面的输出与OP的预期输出不匹配,因为OP不会发布完整字符串,而只会发布部分字符串。

Popen.terminate()
Popen.kill()

说明

  • -匹配连字符-字面上
  • \\s匹配空白字符
  • `从字面上匹配严重的重音字符
  • ([^`]*)将任何不存在于集合中的字符(重音符`除外的任何字符)捕获到捕获组1中
  • ``从字面上匹配两个严重的重音字符
  • ([^`]*)将集合中不存在的任何数量的任何字符(除了重音符`之外的任何字符)捕获到捕获组2中
  • `从字面上匹配严重的重音字符
  • ((?:\\(\\))?)将以下内容捕获到捕获组3中
    • (?:\\(\\))? 匹配以下零或一次
      • \\(\\)从字面上匹配左括号和右括号()
  • \\n\\n匹配两个换行符

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM