繁体   English   中英

正则表达式在 '.' 之后分割这个字符串如果后面有一个大写字母 [AZ]

[英]Regex to split this string after '.' if there is a capital letter [A-Z] after it

字符串是:

“今天的加密货币价格与世界上最大的加密货币混合交易,市值略低。今天的加密货币价格与世界上最大的加密货币混合交易,市值交易略低。今天的比特币价格下跌 0.9% 至 61,693 美元。它正在上涨在 10 月份创下近 67,000 美元的历史新高后,今年迄今已上涨 112%。以太币价格在周末攀升至历史新高。AUM 包括个人资产产品的历史新高,例如比特币产品的 552 亿美元(增长 52.2%) 159 亿美元用于以太坊产品(增长 30.0%)。”

输出将如下所示:

Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by 
market capitalization trading marginally lower.
Bitcoin prices today were down 0.9% at $61,693.
It is up 112% this year so far after hitting a record high of near $67,000 in October.
Ether prices climbed to record high during the weekend.
The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase).

我们可以在这里尝试正则表达式拆分:

inp = "Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Bitcoin prices today were down 0.9% at $61,693. It is up 112% this year so far after hitting a record high of near $67,000 in October. Ether prices climbed to record high during the weekend. The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase)."
lines = re.split(r'(?<=\.)\s+(?=[A-Z])', inp)
print(lines)

这打印:

["Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower.",
 "Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower.",
 'Bitcoin prices today were down 0.9% at $61,693.',
 'It is up 112% this year so far after hitting a record high of near $67,000 in October.',
 'Ether prices climbed to record high during the weekend.',
 'The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase).']

这是正则表达式逻辑:

(?<=\.)    assert that dot precedes (but do not consume)
\s+        match one or more whitespace characters
(?=[A-Z])  assert that a capital letter follows (but do not consume)

这是一个简单的方法:-

string = "Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower. Bitcoin prices today were down 0.9% at $61,693. It is up 112% this year so far after hitting a record high of near $67,000 in October.Ether prices climbed to record high during the weekend. The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase)."
a = string.split(". ")
for i in a:
    print(i+("." if i!=a[-1] else ""))

a = string.split(". ")"."之后拆分文本后跟一个间隙( " " )。 这是在浮点数(实数)之后的句号,小数点后没有间隙。 例如:- "0.9"

for loop打印每个添加了"."项目"." 除了最后一项,因为"." 通过拆分除最后一项之外的每个项目来删除。

输出:

"Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower.
Cryptocurrency prices today were trading mixed with the world's largest cryptocurrency by market capitalization trading marginally lower.
Bitcoin prices today were down 0.9% at $61,693.
It is up 112% this year so far after hitting a record high of near $67,000 in October.Ether prices climbed to record high during the weekend.
The AUM included all-time highs for individual asset products such as $55.2 billion for bitcoin products (52.2% increase) and $15.9 billion for ethereum products (30.0% increase)."

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM