Suppose I have such a file name and I want to extract part of it as a string in Python
import re
fn = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
rgx = re.compile('\b_[A-Z]{2}\b')
print(re.findall(rgx, fn))
Expected out put [DE]
, but actual out is []
.
You could use
(?<=_)[A-Z]+(?=_)
This makes use of lookarounds on both sides, see a demo on regex101.com . For tighter results, you'd need to specify more sample inputs though.
Use _([AZ]{2})
Ex:
import re
fn = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
rgx = re.compile('_([A-Z]{2})')
print(rgx.findall(fn)) #You can use the compiled pattern to do findall.
Output:
['DE']
Your desired output seems to be DE
which is in bounded with two _
from left and right. This expression might also work:
# -*- coding: UTF-8 -*-
import re
string = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
expression = r'_([A-Z]+)_'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches!')
YAAAY! "DE" is a match 💚💚💚
Or you can add a 2
quantifier, if you might want:
# -*- coding: UTF-8 -*-
import re
string = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
expression = r'_([A-Z]{2})_'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches!')
Try pattern: \\_([^\\_]+)\\_[^\\_\\.]+\\.xlsx
Explanation:
\\_
- match _
literally
[^\\_]+
- negated character class with +
operator: match one or more times character other than _
[^\\_\\.]+
- same as above, but this time match characters other than _
and .
\\.xlsx
- match .xlsx
literally
The idea is to match last pattern _something_
before extension .xlsx
You could use regular expression ( re
module) for that as already shown, however this could be done without using any import
s, following way:
fn = "DC_QnA_bo_v.15.12.3_DE_duplicates.xlsx"
out = [i for i in fn.split('_')[1:] if len(i)==2 and i.isalpha() and i.isupper()]
print(out) # ['DE']
Explanation: I split fn
at _
then discard 1st element and filter elements so only str
s of length 2, consisting of letters and consisting of uppercases remain.
Another re
solution:
rgx = re.compile('_([A-Z]{1,})_')
print(re.findall(rgx, fn))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.