如何从网页中的表格中抓取所有元素？

Question

我在玩下面的代码。 我只是想从表中获取所有元素，我认为下面的代码可以做到这一点，但是我得到的只是一条消息，内容为：“无”

website_url = requests.get('https://google_cloud_platform.html').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

My_table = soup.find('table',{'class':'p6n-table-full-width p6n-space-above-large p6n-table'})
print(My_table)

这是我要放入的物品的图像。

也许我需要寻找其他类型的标识符。 我不确定要寻找什么。 有没有办法列出所有表名？ 也许它实际上具有不同的名称，ID或类似的东西。

我开始认为它甚至不是真正的桌子。 当我在下面运行脚本时，得到以下消息：'IndexError：list index out of range'。 这使我认为网页中甚至没有一个表。 但是，根据我发布的屏幕截图，有一种叫做“表类”的东西。

import pandas as pd
import requests
from bs4 import BeautifulSoup

res = requests.get("https://google_cloud_platform.html")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))

Answer 1

尝试这个

import requests
from bs4 import BeautifulSoup

res = requests.get("http://127.0.0.1:1234")
soup = BeautifulSoup(res.text, features="lxml")
table = soup.find_all('table')
for t in table:
    print(t.contents)

如何从网页中的表格中抓取所有元素？

问题描述

1 个解决方案

解决方案1
0 2018-08-31 20:11:31

如何从网页中的表格中抓取所有元素？

问题描述

1 个解决方案

解决方案1 0 2018-08-31 20:11:31

解决方案1
0 2018-08-31 20:11:31