从网站链接提取

Question

I am trying to extract some data from WebMD and once I run my code I keep geeting a "None" as a return. 我试图从WebMD中提取一些数据，一旦我运行我的代码，我就会继续将“无”作为回报。 Any idea what I am doing wrong. 知道我做错了什么。 I have the number of returns the same as the number of links but I do not get the links. 我的返回数量与链接数量相同，但我没有得到链接。

import bs4 as bs
import urllib.request
import pandas as pd


source = urllib.request.urlopen('https://messageboards.webmd.com/').read()

soup = bs.BeautifulSoup(source,'lxml')

for url in soup.find_all('div',class_="link"):
    print (url.get('href'))

Answer 1

Your url element is actually a div tag, not an a : 你的url元素实际上是一个div标签，而不是a ：

>>> x = soup.find_all('div', class_="link")
>>> x[0]
<div class="link"><a href="https://messageboards.webmd.com/family-pregnancy/f/relationships/">Relationships</a></div>

You need to select the child before getting the href attribute: 您需要在获取href属性之前选择子项：

>>> x[0].a.get('href')
'https://messageboards.webmd.com/family-pregnancy/f/relationships/'

Just modify your for loop as follows: 只需按如下方式修改for循环：

for url in soup.find_all('div',class_="link"):
    print (url.a.get('href'))

Answer 2

soup.find_all('div',class_="link") returns all div elements with the class link . soup.find_all('div',class_="link")返回带有类link所有div元素。 These elements wrap the a elements that contain the href attributes, so you need to get the href from the correct element like so: 这些元素包含了包含href属性a元素，因此您需要从正确的元素中获取href，如下所示：

for div in soup.find_all('div',class_="link"):
    print (div.a.get('href'))

从网站链接提取

问题描述

2 个解决方案

解决方案1
0 2017-01-17 20:36:45

解决方案2
0 2017-01-17 20:36:47

从网站链接提取

问题描述

2 个解决方案

解决方案1 0 2017-01-17 20:36:45

解决方案2 0 2017-01-17 20:36:47

解决方案1
0 2017-01-17 20:36:45

解决方案2
0 2017-01-17 20:36:47