简体   繁体   English

正则表达式在 Python 3 中的列表中查找字符串

[英]Regex to find string in list in Python 3

How do I get base.php?id=5314 from list?如何从列表中获取 base.php?id=5314?

import urllib.parse
import urllib.request
from bs4 import BeautifulSoup
url = 'http://www.fansubs.ru/search.php'
values = {'Content-Type:' : 'application/x-www-form-urlencoded',
      'query' : 'Boku dake ga Inai Machi' }
d = {}
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()
soup = BeautifulSoup(the_page, 'html.parser')
for link in soup.findAll('a'):
    d[link] = (link.get('href'))
x = (list(d.values()))

You can use the build-in function filter in combination with a regex .您可以将内置函数filterregex结合使用。 Example:例子:

import re

# ... your code here ...

x = (list(d.values()))
test = re.compile("base\.php\?id=", re.IGNORECASE)
results = filter(test.search, x)

Update based on comment: You can convert the filter results into a list:根据评论更新:您可以将过滤结果转换为列表:

print(list(results))

Example results with the following hard-coded list:具有以下硬编码列表的示例结果:

x = ["asd/asd/asd.py", "asd/asd/base.php?id=5314",
     "something/else/here/base.php?id=666"]

You get:你得到:

['asd/asd/base.php?id=5314', 'something/else/here/base.php?id=666']

This answer is based on this page which talks about filtering lists.这个答案基于这个讨论过滤列表的页面。 It has few more implementations to do the same thing, that might suit you better.它有更多的实现来做同样的事情,这可能更适合你。 Hope it helps希望能帮助到你

You can pass a regex directly to find_all which will do the filtering for you based on the href with href=re.compile(... :您可以将正则表达式直接传递给find_all ,它会根据带有href=re.compile(...的 href 为您进行过滤:

import re

with urllib.request.urlopen(req) as response:
    the_page = response.read()
    soup = BeautifulSoup(the_page, 'html.parser')
    d = {link:link["href"] for link in soup.find_all('a', href=re.compile(re.escape('base.php?id='))}

find_all will only return the a tags that have a href attribute that matches the regex. find_all 将只返回具有与正则表达式匹配的 href 属性的 a 标签。

which gives you:这给了你:

In [21]:d = {link:link["href"] for link in soup.findAll('a', href=re.compile(re.escape('base.php?id='))}

In [22]: d
Out[22]: {<a href="base.php?id=5314">Boku dake ga Inai Machi <small>(ТВ)</small></a>: 'base.php?id=5314'}

Considering you only seem to be looking for one link then it would make more sense just to use find:考虑到您似乎只是在寻找一个链接,那么使用 find 会更有意义:

In [36]: link = soup.find('a', href=re.compile(re.escape('base.php?id='))

In [37]: link
Out[37]: <a href="base.php?id=5314">Boku dake ga Inai Machi <small>(ТВ)</small></a>

In [38]: link["href"]
Out[38]: 'base.php?id=5314'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM