[英]How to find next 9 characters after a string ignoring special characters?
考慮以下字符串:
str_test = 'This is a sample text NRC234456789 and this is another case AZN.1.Z.3.4.S.6.7.8.9 and this another case BSA 123 456 789 and final case SSR/789456123'
基本上,我需要在字符串中找到字符 'NRC' 、'AZN'、'BSA' 和 'SSR' 的位置。 然后,我需要提取接下來的 9 個數字……忽略任何非數字字符。 所以它應該返回
在某些情況下,數字 5 被錯誤地寫為 S,數字 2 被寫為 Z。我仍然需要識別這些情況並將錯誤的 S 和 Z 分別更改為 5 和 2。
result = ['NRC234456789', 'AZN123456789' , 'BSA123456789', 'SSR789456123']
我有我正在使用的代碼
list_comb = ['NRC', 'AZN', 'BSA', 'SSR']
def findWholeWord(w):
return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search
它返回找到字符串的位置..但我不確定下一步如何進行。 謝謝
使用此regex
來識別模式。 也許它可以幫助:
import re
str_test = 'This is a sample text NRC234456789 and this is another case AZN.1.2.3.4.5.6.7.8.9 and this another case BSA 123 456 789 and final case SSR/789456123'
regex = re.findall("([A-Z0-9.\s\/]{2,})",str_test)
result = []
如果非數字字符只有點、逗號和斜線,則一種解決方案:
for r in regex:
result.append(r.replace(".","").replace(" ","").replace("/",""))
print (result)
或者如果非數字字符可以是任何,則使用此循環:
for r in regex:
result.append(re.sub("([^\d\w])","",r))
print (result)
輸出:
['NRC234456789', 'AZN123456789', 'BSA123456789', 'SSR789456123']
更新
import re
str_test = 'This is a sample text NRC234456789 and this is another case AZN.1.Z.3.4.S.6.7.8.9 and this another case BSA 123 456 789 and final case SSR/789456123'
regex = re.findall("([A-Z]{3})([A-Z0-9.\s\/]{2,})",str_test)
result = []
for r in regex:
result.append(r[0]+("".join(re.sub("([^\d\w])","",str(r[1])).replace("Z","2").replace("S","5"))))
print (result)
輸出:
['NRC234456789', 'AZN123456789', 'BSA123456789', 'SSR789456123']
這是一種方法
前任:
import re
str_test = 'This is a sample text NRC234456789 and this is another case AZN.1.2.3.4.5.6.7.8.9 and this another case BSA 123 456 789 and final case SSR/789456123'
to_check = ['NRC', 'AZN', 'BSA', 'SSR']
pattern = re.compile("("+"|".join(to_check) + ")([\d+\.\s\/]+)")
for k, v in pattern.findall(str_test):
print(k + re.sub(r"[^\d]", "", v))
輸出:
NRC234456789
AZN123456789
BSA123456789
SSR789456123
根據評論編輯。
import re
str_test = 'This is a sample text NRC234456789 and this is another case AZN.1.Z.3.4.S.6.7.8.9 and this another case BSA 123 456 789 and final case SSR/789456123'
to_check = ['NRC', 'AZN', 'BSA', 'SSR']
pattern = re.compile("("+"|".join(to_check) + ")([\d+\.\s\/ZS]+)")
for k, v in pattern.findall(str_test):
new_val = k + re.sub(r"[^\d]", "", v.replace("Z", "2").replace("S", "5"))
print(new_val)
這是首先使用此正則表達式查找預期文本的簡單方法,
\b(?:NRC|AZN|BSA|SSR)(?:.?\d)+
使用提供的列表動態生成,然后從中刪除任何非字母數字字符。
編輯:為了處理錯誤的字符串,其中2
被錯誤地寫為Z
並且5
被錯誤地寫為S
,您可以在忽略初始三個字符的字符串的第二部分替換它們。 此外,代碼已更新,因此它只選擇接下來的九位數字而不是更多數字。 這是我更新的相同 Python 代碼,
import re
s = 'This is a sample text NRC234456789 and this is another case AZN.1.Z.3.4.S.6.7.8.9 and this another case BSA 123 456 789 and BSA 123 456 789 123 456 final case SSR/789456123'
list_comb = ['NRC', 'AZN', 'BSA', 'SSR']
regex = r'\b(?:{})(?:.?[\dA-Z])+'.format('|'.join(list_comb))
print(regex)
for m in re.findall(regex, s):
m = re.sub(r'[^a-zA-Z0-9]+', '', m)
mat = re.search(r'^(.{3})(.{9})', m)
if mat:
s1 = mat.group(1)
s2 = mat.group(2).replace('S','5').replace('Z','2')
print(s1+s2)
打印修正值,其中S
替換為5
, Z
替換為2
,
NRC234456789
AZN123456789
BSA123456789
BSA123456789
SSR789456123
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.