如何从给定的字符串中提取包括换行符（\\n）在内的所有字符

Question

我有以下格式的字符串：请注意：\\n 表示换行

\\n\\n下表按主要类别和加权平均使用寿命提供了\\n收购的无形资产的详细信息：\\n\\n \\n\\n（以百万美元计）有用寿命\\n客户关系 15 年 $265\\n知识产权 10 年120\\n商品名称 15 年 51\\n优惠租赁 38 年 26\\n其他各种 2\\n无形资产总额 464 美元\\n\\n在 Loders 的 30%\\n可赎回非控制性权益的期初资产负债表中的公允价值估计为\\n为 4.5 亿美元.

我必须提取\\n\\n \\n\\n和\\n\\n之间的所有字符

预期输出：

（百万美元）有用的生命\\n客户关系 15 年 $265\\n知识产权 10 年 120\\n商品名称 15 年 51\\n优惠租赁 38 年 26\\n其他各种 2\\n无形资产总额 $464

我写了一个逻辑如下：

re.findall(r'(\\n\\n\\s\\n\\n)(.|\\n)*(\\n\\n)', 结果)

但上面的代码没有给我想要的结果。 有人可以帮忙吗？

Answer 1

您可以先匹配双换行符（或匹配可选的回车和换行符），然后捕获第 1 组中以换行符结尾且不以换行符开头的所有行。

使用re.findall ，您将得到一个包含捕获组值的列表。 期望的结果是第二项。

\r?\n\r?\n(.*(?:\r?\n(?!\r?\n).*)*)\r?\n\r?\n

正则表达式演示| Python 演示

import re

s="\n\nThe following table provides the details of intangible assets\nacquired, by major class and weighted average useful life:\n\n \n\n(USS in millions) USEFUL LIFE\nCustomer relationships 15 years $265\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets $464\n\nThe fair value in the opening balance sheet of the 30%\nredeemable noncontrolling interest in Loders was estimated to\nbe $450 million."

regex = r"\r?\n\r?\n(.*(?:\r?\n(?!\r?\n).*)*)\r?\n\r?\n"

print(re.findall(regex, s))

输出

[
'The following table provides the details of intangible assets\nacquired, by major class and weighted average useful life:', 
'(USS in millions) USEFUL LIFE\nCustomer relationships 15 years $265\nIntellectual property 10 years 120\nTrade names 15 years 51\nFavorable leases 38 years 26\nOther various 2\nTotal intangible assets $464'
]

如何从给定的字符串中提取包括换行符（\\n）在内的所有字符

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-27 15:21:24

如何从给定的字符串中提取包括换行符（\\n）在内的所有字符

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-27 15:21:24

解决方案1
0 已采纳 2020-08-27 15:21:24