I have a html file like following:
<form action="/2811457/follow?gsid=3_5bce9b871484d3af90c89f37" method="post">
<div>
<a href="/2811457/follow?page=2&gsid=3_5bce9b871484d3af90c89f37">next_page</a>
<input name="mp" type="hidden" value="3" />
<input type="text" name="page" size="2" style='-wap-input-format: "*N"' />
<input type="submit" value="jump" /> 1/3
</div>
</form>
how to extract the href ""/2811457/follow?page=2&gsid=3_5bce9b871484d3af90c89f37" in next_page?
It is a part of html,I intend to make it clear. When I use beautifulsoup,
print soup.find('a',href=re.compile('follow?page'))
it return None,why? I'm new to beautifulsoup,and I have look the document,but still confused.
now I use an ugly way:
urls = soup.findAll('a',href=True))
for url in urls:
if follow?page in url:
print url
I need a more clear and elegant way.
You need to escape the question mark. The regular expression w?
means zero or one w
. Try this:
print soup.find('a', href = re.compile(r'.*follow\?page.*'))
Below is my text.
<div id="mydiv_99288" class="cls_style2">
<a class="disabled pggreen pg-top-bton"> ** FIRST ** </a>
<a class="disabled pggreen pg-prev-bton"> << PREV </a>
<a class="pg-current">One</a>
<a class="pg-normal" href="/department/office/pg2">Two</a>
<a class="pg-normal" href="/department/office/pg3">Three</a>
<a class="pg-normal pg-bton" href="/department/office/pg2"> NEXT >> </a>
<a class="pg-normal pg-bton" href="/department/office/pg3"> LAST >> </a>
</div>
I want to get the link which is for : NEXT >> in the above code. The link with href : /department/office/pg2 Any help ?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.