如何调用特定的锚标记并将其传递回Python Webscraper中的url？

Question

I'm working on a problem for an online class, where I'm supposed to use BeautifulSoup to build a simple webscraper. 我正在处理在线课程的问题，我应该在该课程中使用BeautifulSoup构建一个简单的Webscraper。

Here is my progress so far: 到目前为止，这是我的进度：

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

count = int(4)
position = int(3)

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = 'http://py4e-data.dr-chuck.net/known_by_Fikret.html'

html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
tags = soup('a', None)
for tag in tags:
    print(tag.get('href', None))

My question is this: How do I extract a particular anchor tag from the list of tags in tag? 我的问题是：如何从标签中的标签列表中提取特定的锚标签？ Also, how can I make the for loop only iterate four times? 另外，如何使for循环仅迭代四次？

assignment details: 作业详细信息：

作业详细信息

Update: 更新：

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

position = int(3)
count = int(4)

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')

for i in range(count):
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    print(tags[position])

So I can call a tag at a position this way, but I need to know how to iterate the tag at a position. 因此，我可以通过这种方式在某个位置调用标签，但是我需要知道如何在某个位置迭代标签。 As it is now, my program just prints the third link four times. 现在，我的程序只打印第三个链接四次。

Answer 1

Got it! 得到它了！

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl

position = int(17)
count = int(7)

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')

for i in range(count):
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    url = soup('a')[position].get('href', None)
    print(url)

Answer 2

As you already know, tags = soup('a') produces quite a long list of links. 如您所知， tags = soup('a')产生了很长的链接列表。

You haven't said how you want to search for one of the links. 您尚未说过如何搜索链接之一。 I'll assume that you're selecting by name. 我假设您按名称进行选择。 Then here's how to search for Montgomery. 然后是如何搜索蒙哥马利。

>>> soup.find_all(string='Montgomery')
['Montgomery']

Once you've got that you can get the link ('a') element that contains 'Montgomery` in this way: 一旦知道了，就可以通过以下方式获取包含“蒙哥马利”的链接（'a'）元素：

>>> soup.find_all(string='Montgomery')[0].findParent()
<a href="http://py4e-data.dr-chuck.net/known_by_Montgomery.html">Montgomery</a>

Then you can get the attribute of the link element which is the actual url for Montgomery. 然后，您可以获取链接元素的属性，该属性是蒙哥马利的实际网址。

>>> soup.find_all(string='Montgomery')[0].findParent().attrs['href']
'http://py4e-data.dr-chuck.net/known_by_Montgomery.html'

One way of going through a loop at most four times: 一种最多循环四次的方法：

count = 0
for tag in tags:
    <do something>
    count += 1
    if count >= 4:
        break

如何调用特定的锚标记并将其传递回Python Webscraper中的url？

问题描述

2 个解决方案

解决方案1
1 2017-08-30 20:14:33

解决方案2
0 2017-08-27 21:05:12

如何调用特定的锚标记并将其传递回Python Webscraper中的url？

问题描述

2 个解决方案

解决方案1 1 2017-08-30 20:14:33

解决方案2 0 2017-08-27 21:05:12

解决方案1
1 2017-08-30 20:14:33

解决方案2
0 2017-08-27 21:05:12