[英]How to Split String with Selected White Spaces Python
我正在嘗試在 python 中拆分以下字符串。是否可以在給定相應輸入的情況下實現以下 output?
輸入
Platforms: Linux Applies to versions: 10.0 Upgrades to: 10.0 Severity: 10 - High Impact/High Probability of Occurrence \Categories: Availability, Compatibility, Data, Function, Performance, Security Vulnerability (Sec/Int), Serviceability, Usability Abstract: SqlGuard Patch 10.0p4052 Sniffer Update
Output
Platforms: Linux
Applies to versions: 10.0
Upgrades to: 10.0
Severity: 10 - High Impact/High Probability of Occurrence
Categories: Availability, Compatibility, Data, Function, Performance, Security Vulnerability (Sec/Int), Serviceability, Usability
Abstract: SqlGuard Patch 10.0p4052 Sniffer Update
由於字段是固定的,所以拆分字段而不是空格:
>>> fields = [
... "Platforms: ",
... "Applies to versions: ",
... "Upgrades to: ",
... "Severity: ",
... "Categories: ",
... "Abstract: ",
... ]
>>> import re
>>> for k,v in zip(fields, re.split("|".join(fields), s)[1:]):
... print(k + v)
...
Platforms: Linux
Applies to versions: 10.0
Upgrades to: 10.0
Severity: 10 - High Impact/High Probability of Occurrence
Categories: Availability, Compatibility, Data, Function, Performance, Security Vulnerability (Sec/Int), Serviceability, Usability
Abstract: SqlGuard Patch 10.0p4052 Sniffer Update
由於其他答案依賴於已知的字段列表,讓我們嘗試一個先驗不知道字段的解決方案:
import re
string = r"Platforms: Linux Applies to versions: 10.0 Upgrades to: 10.0 Severity: 10 - High Impact/High Probability of Occurrence \Categories: Availability, Compatibility, Data, Function, Performance, Security Vulnerability (Sec/Int), Serviceability, Usability Abstract: SqlGuard Patch 10.0p4052 Sniffer Update"
iterable = iter(re.split(r"([A-Z][a-z ]+:)", string)[1:]) # "Applies to versions:"
for field in iterable:
print(field, next(iterable), sep='')
OUTPUT
> python3 test.py
Platforms: Linux
Applies to versions: 10.0
Upgrades to: 10.0
Severity: 10 - High Impact/High Probability of Occurrence \
Categories: Availability, Compatibility, Data, Function, Performance, Security Vulnerability (Sec/Int), Serviceability, Usability
Abstract: SqlGuard Patch 10.0p4052 Sniffer Update
>
你能解釋一下正則表達式背后的邏輯嗎?
我們正在做一個re.split()
,但帶有保留括號,以便我們拆分的任何模式都得到保留。 所有字段名稱的模式都是相同的,例如"Applies to versions:"
( # retain split pattern match
[A-Z] # starts with a capital letter
[a-z ]+ # continues with lower case letters and spaces
: # a colon marks the end of the field name
)
當我們執行re.split()
時,字符串實際上以模式匹配開始,這導致re.split()
在第一個項目之前返回一個空字段,因此re.split(...)[1:]
扔掉第一個空物品。 我們現在有一個字段名稱和字段主體的列表,我們使用迭代器成對地遍歷它們。
我正在嘗試在 python 中拆分以下字符串。 給定相應的輸入,是否可以實現以下 output?
輸入
Platforms: Linux Applies to versions: 10.0 Upgrades to: 10.0 Severity: 10 - High Impact/High Probability of Occurrence \Categories: Availability, Compatibility, Data, Function, Performance, Security Vulnerability (Sec/Int), Serviceability, Usability Abstract: SqlGuard Patch 10.0p4052 Sniffer Update
Output
Platforms: Linux
Applies to versions: 10.0
Upgrades to: 10.0
Severity: 10 - High Impact/High Probability of Occurrence
Categories: Availability, Compatibility, Data, Function, Performance, Security Vulnerability (Sec/Int), Serviceability, Usability
Abstract: SqlGuard Patch 10.0p4052 Sniffer Update
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.