[英]Link extraction from website
I am trying to extract some data from WebMD and once I run my code I keep geeting a "None" as a return. 我试图从WebMD中提取一些数据,一旦我运行我的代码,我就会继续将“无”作为回报。 Any idea what I am doing wrong.
知道我做错了什么。 I have the number of returns the same as the number of links but I do not get the links.
我的返回数量与链接数量相同,但我没有得到链接。
import bs4 as bs
import urllib.request
import pandas as pd
source = urllib.request.urlopen('https://messageboards.webmd.com/').read()
soup = bs.BeautifulSoup(source,'lxml')
for url in soup.find_all('div',class_="link"):
print (url.get('href'))
Your url
element is actually a div
tag, not an a
: 你的
url
元素实际上是一个div
标签,而不是a
:
>>> x = soup.find_all('div', class_="link")
>>> x[0]
<div class="link"><a href="https://messageboards.webmd.com/family-pregnancy/f/relationships/">Relationships</a></div>
You need to select the child before getting the href attribute: 您需要在获取href属性之前选择子项:
>>> x[0].a.get('href')
'https://messageboards.webmd.com/family-pregnancy/f/relationships/'
Just modify your for loop as follows: 只需按如下方式修改for循环:
for url in soup.find_all('div',class_="link"):
print (url.a.get('href'))
soup.find_all('div',class_="link")
returns all div
elements with the class link
. soup.find_all('div',class_="link")
返回带有类link
所有div
元素。 These elements wrap the a
elements that contain the href attributes, so you need to get the href from the correct element like so: 这些元素包含了包含href属性
a
元素,因此您需要从正确的元素中获取href,如下所示:
for div in soup.find_all('div',class_="link"):
print (div.a.get('href'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.