I am working on 3rd Party application where I have read view to the Webpage source content.And from there we have to collect only some href
content values which has pattern like /aems/file/filegetrevision.do?fileEntityId
. Is it possible? My one giving me all the href
values.
HTML * (Part of HTML) *
<td width="50%">
<a href="/aems/file/filegetrevision.do?fileEntityId=10597525&cs=9b7sjueBiWLBEMj2ZU4I6fyQoPv-g0NLY9ETqP0gWk4.xyz">
screenshot.doc
</a>
</td>
CODE
for a in soup.find_all('a', {"style": "display:inline; position:relative;"}, href=True):
href = a['href'].strip()
href = "https://xyz.test.com/" + href
print(href)
Thanks
Thanks,
Yeah, just use a proper filter for the href
attribute. Like
def filter(href):
return '/aems/file/filegetrevision' in href
soup.find_all('a', href=filter)
Besides functions, you can also use RegexObject
objects as filters:
filter = re.compile(some_regular_expression)
soup.find_all('a', href=filter)
See the docs: Kind of filters
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.