如何从python中的文本中提取列数据（正则表达式）

Question

Let's say we have text within which column header are stored in the form: 假设我们有文本，其中列标题存储在表单中：

{|
|+ The table's caption
! scope="col" width="20"style="background-color:#cfcfcf;"align="center" | Column header 1
! scope="col" width="20"style="background-color:#ff55ff;"align="center" | Column header 2
! scope="col" | Column header 3
|-
! scope="row" | Row header 1
| Cell 2 || Cell 3
|-
! scope="row" | Row header A
| Cell B
| Cell C
|}

How can I extract all the columns ([ Column header 1 , Column header 2 , Column header 3 ]) from the text in python? 如何从python中的文本中提取所有列（[ 列标题1 ， 列标题2 ， 列标题3 ]）？

re.findall('*! scope="col" |', text, re.IGNORECASE)

But it's not doing the job. 但它没有做好这项工作。

https://regex101.com/r/PLKREz/6 https://regex101.com/r/PLKREz/6

How can I do it in Python? 我怎么能在Python中做到这一点？

Answer 1

You can find all the substrings after the last | 你可以找到最后一个|之后的所有子串 in a line with scope="col" : 在scope="col" ：

import re

data = """
{|
|+ The table's caption
! scope="col" width="20"style="background-color:#cfcfcf;"align="center" | Column header 1
! scope="col" width="20"style="background-color:#ff55ff;"align="center" | Column header 2
! scope="col" | Column header 3
|-
! scope="row" | Row header 1
| Cell 2 || Cell 3
|-
! scope="row" | Row header A
| Cell B
| Cell C
|}"""

print(re.findall(r'scope="col".*?\| ([^|]+)$', data, re.MULTILINE))

Prints: 打印：

['Column header 1', 'Column header 2', 'Column header 3']

如何从python中的文本中提取列数据（正则表达式）

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-11-02 17:20:23

如何从python中的文本中提取列数据（正则表达式）

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-11-02 17:20:23

解决方案1
0 已采纳 2016-11-02 17:20:23