Python：脚本未编写变量链接

Question

My script below... 我的脚本如下...

I feel like I'm missing one line of code to make this work properly. 我觉得我缺少一行代码来使其正常工作。 Using Reddit as a test source to scrap sport links. 使用Reddit作为测试源来废弃体育链接。

# import libraries
import bs4
from urllib2 import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.reddit.com/r/BoxingStreams/comments/6w2vdu/mayweather_vs_mcgregor_archive_footage/'

# opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

hyperli = page_soup.findAll("form")


filename = "sportstreams.csv"
f = open(filename, "w")

headers = "Sport Links"

f.write(headers)

for containli in hyperli:
    link = containli.a["href"] 

    print(link)

    f.write(str(link)+'\n')

f.close()

Everything works except that it only grabs the link from the first row [0]. 除了仅从第一行[0]中获取链接之外，其他所有内容都有效。 If I don't use the code ["href"] then it adds all the (a href links) except that it also adds the word NONE to the CSV file. 如果我不使用代码["href"]则它会添加所有（a href链接），除了还会在CSV文件中添加单词NONE 。 Using the ["href"] would (I hope) just add the http links and avoid adding the word NONE. 我希望使用["href"]仅添加http链接，而避免添加单词NONE。

What am I missing here? 我在这里想念什么？

Answer 1

As explained in the documentation Navigating using tag names : 如文档中的使用标签名称导航中所述：

Using a tag name as an attribute will give you only the first tag by that name 使用标签名称作为属性，只会给您该名称的第一个标签
... ...
If you need to get all the <a> tags, or anything more complicated than the first tag with a certain name, you'll need to use one of the methods described in Searching the tree , such as find_all() : 如果需要获取所有<a>标记，或者要获取比第一个具有特定名称的标记更复杂的标记，则需要使用“ 搜索树”中描述的方法之一，例如find_all() ：

In your case, you could use page_soup.select("form a[href]") to find all the links in forms that have href attributes. 您可以使用page_soup.select("form a[href]")查找具有href属性的表单中的所有链接。

links = page_soup.select("form a[href]")
for link in links:
    href = link["href"]
    print(href)
    f.write(href + "\n")

Python：脚本未编写变量链接

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-09-22 23:42:46

Python：脚本未编写变量链接

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-09-22 23:42:46

解决方案1
0 已采纳 2017-09-22 23:42:46