[英]RegEx for alphanumeric text strings up to special patterns
我有一個具有特定格式的字符串列表,只需要其中的一部分即可。
my_list = ['The Price Is Right S47E141 720p WEB x264-W4F', 'Breakthrough-The Ideas That Changed the World S01E01 480p x264-mSD',
'The Kid Who Would Be King 2019 DVDR-JFKDVD', 'American Housewife S03E18 Phone Free Day 1080p AMZN WEB-DL DDP5 1 H 264-NTb',
'VICE News Tonight 2019 04 16 720p AMZN WEB-DL DDP2 0 H 264-monkee','The Flash 2014 S05E18 Godspeed 720p AMZN WEB-DL DDP5 1 H 264-NTb',
'The Rachel Maddow Show 2019 04 16 720p MNBC WEB-DL AAC2 0 x264-BTW','Lets Make A Deal 2009 S10E142 XviD-AFG']
try:
try:
def get_rls(t):
w = re.match(".*\d{4} \d{2} \d{2} ", t)
# w = re.match(".*S\d+E\d+", t)
if not w: raise Exception("Error For Regular Expression")
return w.group(0)
regular_case = [my_list ]
for w in regular_case:
Regular_part = get_rls(w)
print(">>>> Movie Regular Part contains Year/Mon/Day : ", Regular_part)
except:
try:
def get_rls(t):
# w = re.match(".*\d ", t)
w = re.match(".*S\d+E\d+", t)
if not w: raise Exception("Error For Regular Expression")
return w.group(0)
regular_case = [my_list ]
for w in regular_case:
Regular_part = get_rls(w)
print(">>>> Movie Regular Part contains S0E0 : ", Regular_part)
except:
def get_rls(t):
w = re.match(".*\d{4} ", t)
# w = re.match(".*S\d+E\d+", t)
if not w: raise Exception("Error For Regular Expression")
return w.group(0)
regular_case = [my_list ]
for w in regular_case:
Regular_part = get_rls(w)
print(">>>> Movie Regular Part contains Year : ", Regular_part)
except:
print(">>>> Weard Release Name! Pass the Regular part ")
Regular_part = my_list
問題是,我的正則表達式代碼只能獲取一個元素並決定使用哪個RegEx有用並打印正則表達式,而我需要RegEx代碼能夠獲取列表並處理每個單個元素,例如獲取第一個元素和決定哪一個是好的。
最好的結果應類似於以下列表:
my_list = ['The Price Is Right S47E141', 'Breakthrough-The Ideas That Changed the World S01E01',
'The Kid Who Would Be King 2019 DVDR-JFKDVD', 'American Housewife S03E18 ',
'VICE News Tonight 2019 04 16','The Flash 2014 S05E18',
'The Rachel Maddow Show 2019 04 16 ','Lets Make A Deal 2009 S10E142']
此RegEx並非完全正確的答案,但也許它將幫助您找出處理文本輸入的一般方法。 也許RegEx並不是解決此問題的最佳方法:
^.+?(?:[SE0-9]+)|(?:\s[A-Z]{4}\-[A-Z]{1,})|(?:.+[0-9]{4}\s[0-9]{2}\s[0-9]{2})|(?:\s[SE0-9]{6,10})
SE
模式和date
模式對於此RegEx很簡單。 你可能會面臨的問題是隨機年,如2014
, 2009
和2019
,你可能會考慮。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.