简体   繁体   English

如何遍历要打印的URL列表 <P> 与Beautifulsoup

[英]How do I loop through a list of URLs to print <P> with Beautifulsoup

I just found out about beautifulsoup(4). 我刚刚发现了beautifulsoup(4)。 I have a lot of links and I want to print the <p> tag of multiple websites at once, but I don't know how to do it as I'm a beginner. 我有很多链接,我想一次打印多个网站的<p>标记,但是由于我是初学者,所以我不知道该怎么做。 I can't really find anything on stackoverflow what fits for me too. 我在stackoverflow上找不到真正适合我的东西。
Something like this doesn't work: 像这样的东西不起作用:

from bs4 import BeautifulSoup
import requests
import warnings

warnings.filterwarnings("ignore", category=UserWarning, module='bs4')
url = ["http://fc.lc/api?api=9053290fd05b5e5eb091b550078fa1e30935c92c&url=https://wow-ht.ml?s=https://cutlinks.pro/api?api=e6a8809e51daedcf30d9d6270fd0bfeba73c1dcb&url=https://google.com=text&format=text", "http://fc.lc/api?api=9053290fd05b5e5eb091b550078fa1e30935c92c&url=https://wow-ht.ml?s=https://cutlinks.pro/api?api=e6a8809e51daedcf30d9d6270fd0bfeba73c1dcb&url=https://example.com&format=text&format=text"]

# add header
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, "lxml")
print( soup.find('p').text )

Error I get with this (I didn't expect it to work anyways(Giving me a possible duplicate answer to this error won't help me, read my question in the title first): 我遇到的错误(我没想到它仍然会起作用(为我提供对此错误的可能重复答案不会帮助我,请先阅读标题中的问题):

Traceback (most recent call last):
  File "C:\Users\Gebruiker\Desktop\apitoshortened.py", line 10, in <module>
    r = requests.get(url, headers=headers)
  File "C:\Users\Gebruiker\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Users\Gebruiker\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Users\Gebruiker\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Gebruiker\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 640, in send
    adapter = self.get_adapter(url=request.url)
  File "C:\Users\Gebruiker\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 731, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '['http://fc.lc/api?api=9053290fd05b5e5eb091b550078fa1e30935c92c&url=https://wow-ht.ml?s=https://cutlinks.pro/api?api=e6a8809e51daedcf30d9d6270fd0bfeba73c1dcb&url=https://google.com=text&format=text', 'http://fc.lc/api?api=9053290fd05b5e5eb091b550078fa1e30935c92c&url=https://wow-ht.ml?s=https://cutlinks.pro/api?api=e6a8809e51daedcf30d9d6270fd0bfeba73c1dcb&url=https://example.com&format=text&format=text']'

I didn't really expect it to be so simple tough, any help wouldbe appreciated! 我没想到它真的那么简单,任何帮助都将不胜感激!

if you have list then use for loop 如果您有列表,则使用for循环

for item in url:
    r = requests.get(item, headers=headers)
    soup = BeautifulSoup(r.content, "lxml")
    print(soup.find('p').text)

By the way: your url doesn't return any HTML but some text with link - so code can't find <p> . 顺便说一句:您的url不返回任何HTML而是包含链接的某些文本-因此代码找不到<p>

See this returned text 看到此返回的文本

for item in url:
    r = requests.get(item, headers=headers)
    print(r.text)    

Result 结果

https://fc.lc/C4FNiXbY

Use for loop and then check if p tag present.If so print the text. 使用for循环,然后检查是否存在p标签。

from bs4 import BeautifulSoup
import requests
import warnings

warnings.filterwarnings("ignore", category=UserWarning, module='bs4')
urls = ["http://fc.lc/api?api=9053290fd05b5e5eb091b550078fa1e30935c92c&url=https://wow-ht.ml?s=https://cutlinks.pro/api?api=e6a8809e51daedcf30d9d6270fd0bfeba73c1dcb&url=https://google.com=text&format=text", "http://fc.lc/api?api=9053290fd05b5e5eb091b550078fa1e30935c92c&url=https://wow-ht.ml?s=https://cutlinks.pro/api?api=e6a8809e51daedcf30d9d6270fd0bfeba73c1dcb&url=https://example.com&format=text&format=text"]

# add header
for url in urls:
 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
 r = requests.get(url, headers=headers)
 soup = BeautifulSoup(r.content, "lxml")
 if soup.find('p'):
    print( soup.find('p').text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM