阅读所有<p>进入阵列 python BS4</p>

Question

Hey I am trying to read all <p> tags into an array.嘿，我正在尝试将所有<p>标签读入一个数组。

HTML Example: HTML 示例：

<p>To test web scrapers against invalid markup we suggest scraping this page that contains the following markup mistakes:</p>
<p>It's obvious that not every web publisher pays much attention to validity of his HTML code.</p>

This should result in an Array like:这应该会产生一个像这样的数组：

scraped = ["To test web scrapers against invalid markup we suggest scraping this page that contains the following markup mistakes:","It's obvious that not every web publisher pays much attention to validity of his HTML code."]

My current code is:我目前的代码是：

class Webscraper:

def fullscrape(self, url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    content = soup.getText()
    print(content)

But this does not seem to work properly.但这似乎无法正常工作。

Answer 1

You need to do find_all('p') and the iterate and store as list.您需要执行find_all('p')和迭代并存储为列表。

content =[item.text for item in soup.find_all('p')] content =[soup.find_all('p') 中项目的 item.text]

Code :代码：

def fullscrape(self, url):
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    content =[item.text for item in soup.find_all('p')]
    print(content)

Answer 2

Try This:尝试这个：

for c in content:
   print(c)

阅读所有<p>进入阵列 python BS4</p>

问题描述

2 个解决方案

解决方案1
0 2020-04-03 13:25:05

解决方案2
-1 2020-04-03 13:09:07

阅读所有<p>进入阵列 python BS4</p>

问题描述

2 个解决方案

解决方案1 0 2020-04-03 13:25:05

解决方案2 -1 2020-04-03 13:09:07

解决方案1
0 2020-04-03 13:25:05

解决方案2
-1 2020-04-03 13:09:07