[英]Create list of lists based on a condition
我有一個如下所示的字符串:
result = """The following table provides the details.
acquired, by major class:
(US$ in millions) Customer relationships 15year $265
There is another line without space here.
Another table starts here:
(USS in millions) 2018 2017
Income (loss) from continuing operations $298 $129"""
我必須把所有包含超過 3 個空格的句子放在一個列表列表中。 以下是我到目前為止嘗試過的東西:
lines = result.splitlines()
table_list = []
for i in range(len(lines)):
if re.search(r' {3,}', lines[i]):
table_list.append(lines[i])
以上代碼的結果 output:
['(US$ in millions) Customer relationships 15year $265','(USS in millions) 2018 2017','Income (loss) from continuing operations $298 $129']
預計 Output:
[['(US$ in millions) Customer relationships 15year $265'],['(USS in millions) 2018 2017','Income (loss) from continuing operations $298 $129']]
output 條件的進一步說明: Expected output should be a list of lists 。 在遍歷每一行時,如果有連續的句子在 2 個單詞之間包含 3 個或更多空格,則所有這些行都應該是主列表中同一列表的一部分。 如果一行在 2 個單詞之間不包含 3 個或更多空格,則鏈會中斷。 如果有另一行在 2 個單詞之間包含 3 個或更多空格,則該行將成為主列表中新列表的一部分。
將itertools.groupby
與re.findall
一起使用:
from itertools import groupby
def has_spaces(str_):
return bool(re.findall("\s{3,}", str_))
[list(g) for k, g in groupby(result.splitlines(), key=has_spaces) if k]
Output:
[['(US$ in millions) Customer relationships 15year $265'],
['(USS in millions) 2018 2017',
'Income (loss) from continuing operations $298 $129']]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.