简体   繁体   中英

How To Grab <a href=“url”> Links With No Classes Or ID's with BeautifulSoup4 (Python 2.7)

I am struggling trying to grab a tag that doesn't contain any class or id. It is just the a href, and then the link.

html code - there is more, but this is just a short bit of it. Im trying to grab the a href="url is here", but I can't just grab "a" because it will grab every link on the page.

<table>
<tbody>
<tr class="">
<td class="col1 align">
<a href="url is here">
1
</a>
</td>
<td class="col2">
<a href="www.example.com">
<img class="avatar" src="www.example.com" alt="le me">
le me
<img class="test" alt="test" title="test"    src="test-icon.png">
</a>
</td>
<td class="col3 align">
<a href="www.example.com">
2,715
</a>
</td>
<td class="col4 align">
<a href="www.example.com">
5,400,000,000
</a>
</td>
</tr>

My code:

source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll():
    username = link.get()
    print(username)

I don't have these filled in because anything I try won't work. Not sure what else to do.

You can select all a tags and using the has_attr function check if it has the class or id attributes:

for link in soup.findAll('a'):
    if link.has_attr('class') or link.has_attr('id'):
        continue
    username = link.get('href')
    print(username)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM