Regex to find string in list in Python 3

Question

How do I get base.php?id=5314 from list?

import urllib.parse
import urllib.request
from bs4 import BeautifulSoup
url = 'http://www.fansubs.ru/search.php'
values = {'Content-Type:' : 'application/x-www-form-urlencoded',
      'query' : 'Boku dake ga Inai Machi' }
d = {}
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
   the_page = response.read()
soup = BeautifulSoup(the_page, 'html.parser')
for link in soup.findAll('a'):
    d[link] = (link.get('href'))
x = (list(d.values()))

Answer 1

You can use the build-in function filter in combination with a regex . Example:

import re

# ... your code here ...

x = (list(d.values()))
test = re.compile("base\.php\?id=", re.IGNORECASE)
results = filter(test.search, x)

Update based on comment: You can convert the filter results into a list:

print(list(results))

Example results with the following hard-coded list:

x = ["asd/asd/asd.py", "asd/asd/base.php?id=5314",
     "something/else/here/base.php?id=666"]

You get:

['asd/asd/base.php?id=5314', 'something/else/here/base.php?id=666']

This answer is based on this page which talks about filtering lists. It has few more implementations to do the same thing, that might suit you better. Hope it helps

Answer 2

You can pass a regex directly to find_all which will do the filtering for you based on the href with href=re.compile(... :

import re

with urllib.request.urlopen(req) as response:
    the_page = response.read()
    soup = BeautifulSoup(the_page, 'html.parser')
    d = {link:link["href"] for link in soup.find_all('a', href=re.compile(re.escape('base.php?id='))}

find_all will only return the a tags that have a href attribute that matches the regex.

which gives you:

In [21]:d = {link:link["href"] for link in soup.findAll('a', href=re.compile(re.escape('base.php?id='))}

In [22]: d
Out[22]: {<a href="base.php?id=5314">Boku dake ga Inai Machi <small>(ТВ)</small></a>: 'base.php?id=5314'}

Considering you only seem to be looking for one link then it would make more sense just to use find:

In [36]: link = soup.find('a', href=re.compile(re.escape('base.php?id='))

In [37]: link
Out[37]: <a href="base.php?id=5314">Boku dake ga Inai Machi <small>(ТВ)</small></a>

In [38]: link["href"]
Out[38]: 'base.php?id=5314'

Regex to find string in list in Python 3

Question

2 answers

solution1
1 ACCPTED 2016-03-06 13:08:23

solution2
0 2016-03-06 17:26:15

Regex to find string in list in Python 3

Question

2 answers

solution1 1 ACCPTED 2016-03-06 13:08:23

solution2 0 2016-03-06 17:26:15

solution1
1 ACCPTED 2016-03-06 13:08:23

solution2
0 2016-03-06 17:26:15