简体   繁体   English

python 中的 requests.get(url) 在循环中使用时表现不同

[英]requests.get(url) in python behaving differently when used in loop

I'm new in python programming and trying to scrape every link available in my Urls.txt file.我是 python 编程的新手,并试图抓取我的 Urls.txt 文件中可用的每个链接。 the code I wrote is:我写的代码是:

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
user_agent = UserAgent()
fp = open("Urls.txt", "r")
values = fp.readlines()
fin = open("soup.html", "a")
for link in values:
    print( link )
    page = requests.get(link, headers={"user-agent": user_agent.chrome})
    html = page.content
    soup = BeautifulSoup(html, "html.parser")
    fin.write(str(soup))

The code works absolutely fine when the links are provided directly as string instead of as variable but when used as it is the output differs.当链接直接作为字符串而不是变量提供时,代码工作得非常好,但是当它按原样使用时,output 不同。

Maybe the string you read from the file has a line break.也许您从文件中读取的字符串有换行符。 To remove it use link.strip("\n")要删除它,请使用link.strip("\n")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM