簡體   English   中英

拆分具有不同條件的字符串而不刪除 python 中的字符

[英]Split a string with different condition without removing the character in python

我有一個帶有參數的字符串:

text =  "Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[0, 0, 0, 0, 0, 0, 0] PP=[0, 0, 0, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0]"

我想通過以下方式刪除空格以單獨獲取所有參數:

pred_res = ["Uncertain significance","PVS1=0","PS=[0, 0, 0, 0, 0]","PM=[0, 0, 0, 0, 0, 0, 0]","PP=[0, 0, 0, 0, 0, 0]","BA1=0","BS=[0, 0, 0, 0, 0]","BP=[0, 0, 0, 0, 0, 0, 0, 0]"]

到目前為止,我已經使用了這個正則表達式模式:

pat = re.compile('[a-z]\s[A-Z]|[0-9]\s[A-Z]|]\s[A-Z]')

但它通過以下方式為我提供了刪除字符的結果:

res = ["Uncertain significanc","VS1=","S=[0, 0, 0, 0, 0","M=[0, 0, 0, 0, 0, 0, 0","P=[0, 0, 0, 0, 0, 0","A1=","S=[0, 0, 0, 0, 0","P=[0, 0, 0, 0, 0, 0, 0, 0]"]

那么有沒有辦法防止這種情況並獲得pred_res中顯示的結果?

您可以使用前瞻來檢查文本中緊跟空格后是否有=

import re
text = 'Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[0, 0, 0, 0, 0, 0, 0] PP=[0, 0, 0, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0]'
pred_res = re.split(r' (?=\w+=)', text)
print(pred_res)
# ['Uncertain significance', 'PVS1=0', 'PS=[0, 0, 0, 0, 0]', 'PM=[0, 0, 0, 0, 0, 0, 0]', 'PP=[0, 0, 0, 0, 0, 0]', 'BA1=0', 'BS=[0, 0, 0, 0, 0]', 'BP=[0, 0, 0, 0, 0, 0, 0, 0]']

另一種選擇可能是匹配所有單獨的部分。

\w+=(?:\[[^][]*]|[^][\s]+)|\w+(?: \w+)*(?= \w+=|$)
  • \w+=匹配 1+ 個單詞 char 后跟=
  • (?:非捕獲組
    • \[[^][]*][]匹配
    • | 或者
    • [^][\s]+匹配除空白字符或字符[]以外的任何字符
  • )關閉群組
  • | 或者
  • \w+(?: \w+)*(?= \w+=|$)匹配可選用空格重復的單詞字符和斷言單詞字符后跟=或右側字符串結尾的單詞字符

正則表達式演示

import re

s = "Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[0, 0, 0, 0, 0, 0, 0] PP=[0, 0, 0, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0]"
pattern = r"\w+=(?:\[[^][]*]|[^][\s]+)|\w+(?: \w+)*(?= \w+=|$)"

pred_res = re.findall(pattern, s)
print(pred_res)

Output

['Uncertain significance', 'PVS1=0', 'PS=[0, 0, 0, 0, 0]', 'PM=[0, 0, 0, 0, 0, 0, 0]', 'PP=[0, 0, 0, 0, 0, 0]', 'BA1=0', 'BS=[0, 0, 0, 0, 0]', 'BP=[0, 0, 0, 0, 0, 0, 0, 0]']

利用

\s+(?=[A-Z])

請參閱正則表達式證明

解釋

--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [A-Z]                    any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
  )                        end of look-ahead

Python 代碼

import re
test_str = 'Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[0, 0, 0, 0, 0, 0, 0] PP=[0, 0, 0, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0]'
matches = re.split(r'\s+(?=[A-Z])', test_str)
print(matches)

結果

['Uncertain significance', 'PVS1=0', 'PS=[0, 0, 0, 0, 0]', 'PM=[0, 0, 0, 0, 0, 0, 0]', 'PP=[0, 0, 0, 0, 0, 0]', 'BA1=0', 'BS=[0, 0, 0, 0, 0]', 'BP=[0, 0, 0, 0, 0, 0, 0, 0]']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM