How To Grab <a href=“url”> Links With No Classes Or ID's with BeautifulSoup4 (Python 2.7)

Question

I am struggling trying to grab a tag that doesn't contain any class or id. It is just the a href, and then the link.

html code - there is more, but this is just a short bit of it. Im trying to grab the a href="url is here", but I can't just grab "a" because it will grab every link on the page.

<table>
<tbody>
<tr class="">
<td class="col1 align">
<a href="url is here">
1
</a>
</td>
<td class="col2">
<a href="www.example.com">
<img class="avatar" src="www.example.com" alt="le me">
le me
<img class="test" alt="test" title="test"    src="test-icon.png">
</a>
</td>
<td class="col3 align">
<a href="www.example.com">
2,715
</a>
</td>
<td class="col4 align">
<a href="www.example.com">
5,400,000,000
</a>
</td>
</tr>

My code:

source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll():
    username = link.get()
    print(username)

I don't have these filled in because anything I try won't work. Not sure what else to do.

Answer 1

You can select all a tags and using the has_attr function check if it has the class or id attributes:

for link in soup.findAll('a'):
    if link.has_attr('class') or link.has_attr('id'):
        continue
    username = link.get('href')
    print(username)

How To Grab <a href=“url”> Links With No Classes Or ID's with BeautifulSoup4 (Python 2.7)

Question

1 answers

solution1
0 2016-11-24 00:57:39

How To Grab <a href=“url”> Links With No Classes Or ID's with BeautifulSoup4 (Python 2.7)

Question

1 answers

solution1 0 2016-11-24 00:57:39

solution1
0 2016-11-24 00:57:39