正則表達式用於換行之前的任意數量的單詞

Question

我在段落中解析了一些文本，希望將其拆分為表格。

該字符串如下所示：

["Some text unsure how many numbers or if any special charectors etc. But I don't really care I just want all the text in this string \\n 123 some more text (50% and some more text) \\n"]

我想要做的是將新行之前的第一個文本字符串拆分成原來的樣子-不管是什么。 我首先嘗試使用此[A-Za-z]*\\s*[A-Za-z]*\\s*但很快意識到，由於此字符串中的文本是可變的，因此不會削減它。

然后，我想取第二個字符串中的數字，如下所示：

\d+

最后，我想在第二個字符串中獲取百分比，以下內容似乎適用於該百分比：

\d+(%)+

我正計划在函數中使用它們，但是正在為第一部分的正則表達式進行編譯嗎？ 我也想知道我在后兩個部分中使用的正則表達式是否最有效？

更新：希望這可以使它更加清楚嗎？

輸入：

[' The first chunk of text \\n 123 the stats I want (25% the percentage I want) \\n The Second chunk of text \\n 456 the second stats I want (50% the second percentage I want) \\n The third chunk of text \\n 789 the third stats I want (75% the third percentage) \\n The fourth chunk of text \\n 101 The fourth stats (100% the fourth percentage) \\n]

所需的輸出：

Answer 1

2首行

您可以使用split獲得前兩行：

import re

data = ["Some text unsure how many numbers or if any special charectors etc. But I don't really care I just want all the text in this string \n 123 some more text (50% and some more text) \n"]

first_line, second_line = data[0].split("\n")[:2]
print first_line
# Some text unsure how many numbers or if any special charectors etc. But I don't really care I just want all the text in this string

digit_match = re.search('\d+(?![\d%])', second_line)
if digit_match:
    print digit_match.group()
    # 123

percent_match = re.search('\d+%', second_line)
if percent_match:
    print percent_match.group()
    # 50%

請注意，如果百分比寫在其他數字之前，則\\d+將匹配該百分比（不包含％）。 我添加了一個負向超前查詢，以確保匹配的數字后沒有數字或% 。

每對

如果您想繼續解析線對：

data = [" The first chunk of text \n 123 the stats I want (25% the percentage I want) \n The Second chunk of text \n 456 the second stats I want (50% the second percentage I want) \n The third chunk of text \n 789 the third stats I want (75% the third percentage) \n The fourth chunk of text \n 101 The fourth stats (100% the fourth percentage) \n"]

import re

lines = data[0].strip().split("\n")

# TODO: Make sure there's an even number of lines
for i in range(0, len(lines), 2):
    first_line, second_line = lines[i:i + 2]

    print first_line

    digit_match = re.search('\d+(?![\d%])', second_line)
    if digit_match:
        print digit_match.group()

    percent_match = re.search('\d+%', second_line)
    if percent_match:
        print percent_match.group()

輸出：

The first chunk of text 
123
25%
 The Second chunk of text 
456
50%
 The third chunk of text 
789
75%
 The fourth chunk of text 
101
100%

正則表達式用於換行之前的任意數量的單詞

問題描述

1 個解決方案

解決方案1
2 已采納 2017-03-23 13:41:51

2首行

每對

正則表達式用於換行之前的任意數量的單詞

問題描述

1 個解決方案

解決方案1 2 已采納 2017-03-23 13:41:51

2首行

每對

解決方案1
2 已采納 2017-03-23 13:41:51