I'm trying to scrape links from a Reddit table by using Beautiful Soup, and can successfully extract all of the table's contents except for the URLs. I am using item.find_all('a')
but it's returning an empty list when using this code:
import praw
import csv
import requests
from bs4 import BeautifulSoup
def Authorize():
"""Authorizes Reddit API"""
reddit = praw.Reddit(client_id='',
client_secret='',
username='',
password='',
user_agent='user')
url = 'https://old.reddit.com/r/formattesting/comments/94nc49/will_it_work/'
headers = {'User-Agent': 'Mozilla/5.0'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
table_extract = soup.find_all('table')[0]
table_extract_items = table_extract.find_all('a')
for item in table_extract_items:
letter_name = item.contents[0]
links = item.find_all('a')
print(letter_name)
print(links)
This is what it returns:
6GB EVGA GTX 980 TI
[]
Intel i7-4790K
[]
Asus Z97-K Motherboard
[]
2x8 HyperX Fury DDR3 RAM
[]
Elagto HD 60 Pro Capture Card
[]
I would like for there to be the URL where the empty list is below each table row.
I am not sure if this makes a difference in the construct, but the end goal is to extract all of the table contents and links (keeping the association between the two) and save to a CSV as two columns. But for now I am just trying to print
to keep it simple.
You were almost near. Your table_extract_items
are HTML anchors from which you need to extract text
– the content and attribute href
using [
]
operators. I guess the inappropriate choice of variables name confused you. The line inside for-loop links = item.find_all('a')
is wrong!
Here is my solution:
for anchor in table.findAll('a'):
# if not anchor: finaAll returns empty list, .find() return None
# continue
href = anchor['href']
print (href)
print (anchor.text)
table
in my code is what you named table_extract
in your code
check this:
In [40]: for anchor in table.findAll('a'):
# if not anchor:
# continue
href = anchor['href']
text = anchor.text
print (href, "--", text)
....:
https://imgur.com/a/Y1WlDiK -- 6GB EVGA GTX 980 TI
https://imgur.com/gallery/yxkPF3g -- Intel i7-4790K
https://imgur.com/gallery/nUKnya3 -- Asus Z97-K Motherboard
https://imgur.com/gallery/9YIU19P -- 2x8 HyperX Fury DDR3 RAM
https://imgur.com/gallery/pNqXC2z -- Elagto HD 60 Pro Capture Card
https://imgur.com/gallery/5K3bqMp -- Samsung EVO 250 GB SSD
https://imgur.com/FO8JoQO -- Corsair Scimtar MMO Mouse
https://imgur.com/C8PFsX0 -- Corsair K70 RGB Rapidfire Keyboard
https://imgur.com/hfCEzMA -- I messed up
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.