简体   繁体   English

Python正则表达式获取特定模式

[英]Python regex to get certain pattern

['/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html', '/allstar/NBA_2022.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html', '/allstar/NBA_2021.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html', '/allstar/NBA_2020.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html', '/allstar/NBA_2019.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html', '/allstar/NBA_2018.html', '/allstar/NBA_2017.html', '/allstar/NBA_2017.html'] ['/allstar/NBA-allstar-career-stats.html'、'/allstar/NBA_2022.html'、'/allstar/NBA_2022.html'、'/allstar/NBA_2021.html'、'/allstar/NBA_2021.html ', '/allstar/NBA_2021_voting.html', '/allstar/NBA_2021.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html', '/allstar /NBA_2020.html'、'/allstar/NBA_2019.html'、'/allstar/NBA_2019.html'、'/allstar/NBA_2019_voting.html'、'/allstar/NBA_2019.html'、'/allstar/NBA_2018.html' , '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html', '/allstar/NBA_2018.html', '/allstar/NBA_2017.html', '/allstar/NBA_2017.html']

I want to get only /allstar/NBA_2017.html , /allstar/NBA_2018.html , /allstar/NBA_2019.html using re.compile().我只想使用 re.compile() 获得/allstar/NBA_2017.html/allstar/NBA_2018.html/allstar/NBA_2019.html

Does anyone have an idea?有没有人有想法?

I'm no expert in regex, but this works.我不是正则表达式方面的专家,但这行得通。

import re

li = [
    '/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html',
    '/allstar/NBA_2022.html', '/allstar/NBA_2021.html',
    '/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html',
    '/allstar/NBA_2021.html', '/allstar/NBA_2020.html',
    '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html',
    '/allstar/NBA_2020.html', '/allstar/NBA_2019.html',
    '/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html',
    '/allstar/NBA_2019.html', '/allstar/NBA_2018.html',
    '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html',
    '/allstar/NBA_2018.html', '/allstar/NBA_2017.html',
    '/allstar/NBA_2017.html'
]
prog = r'.*201[789].html'
def match(x):
    return prog.match(x)

prog = re.compile(prog)
res = list(filter(match, li))
print(res)

And this yields the following:这会产生以下结果:

[
    '/allstar/NBA_2019.html', '/allstar/NBA_2019.html',
    '/allstar/NBA_2019.html', '/allstar/NBA_2018.html',
    '/allstar/NBA_2018.html', '/allstar/NBA_2018.html',
    '/allstar/NBA_2017.html', '/allstar/NBA_2017.html'
]

Hope this is what you want!希望这是你想要的!

It's well known that compiling regular expressions in Python is unnecessary unless you have very large numbers of expressions being used in the same program.众所周知,在 Python 中编译正则表达式是不必要的,除非您在同一个程序中使用了大量的表达式。 However, as it seems that you have to compile the expression, you could do this:但是,由于您似乎必须编译表达式,因此可以执行以下操作:

li = ['/allstar/NBA-allstar-career-stats.html', '/allstar/NBA_2022.html', '/allstar/NBA_2022.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021.html', '/allstar/NBA_2021_voting.html', '/allstar/NBA_2021.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020.html', '/allstar/NBA_2020_voting.html',
      '/allstar/NBA_2020.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019.html', '/allstar/NBA_2019_voting.html', '/allstar/NBA_2019.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018.html', '/allstar/NBA_2018_voting.html', '/allstar/NBA_2018.html', '/allstar/NBA_2017.html', '/allstar/NBA_2017.html']
m = re.compile('.*NBA_201[789].html')
print(list(set(filter(m.match, li))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM