简体   繁体   中英

requests.get(url) in python behaving differently when used in loop

I'm new in python programming and trying to scrape every link available in my Urls.txt file. the code I wrote is:

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
user_agent = UserAgent()
fp = open("Urls.txt", "r")
values = fp.readlines()
fin = open("soup.html", "a")
for link in values:
    print( link )
    page = requests.get(link, headers={"user-agent": user_agent.chrome})
    html = page.content
    soup = BeautifulSoup(html, "html.parser")
    fin.write(str(soup))

The code works absolutely fine when the links are provided directly as string instead of as variable but when used as it is the output differs.

Maybe the string you read from the file has a line break. To remove it use link.strip("\n")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM