繁体   English   中英

如何从网页抓取表格到数据框

[英]How To Scrape Table Into Dataframe From The Webpage

我正在尝试用一页将表格抓取到数据框中。

import pandas as pd
import requests
from bs4 import BeautifulSoup

res = requests.get("https://www.viewbase.com/funding")
soup = BeautifulSoup(res.content,'lxml')

table1 = soup.find_all('tr')

该表是通过 JS 脚本填充的,因此 BS4 不会看到它。 但是,您可以在headless模式下使用selenium并获取所需内容。

以下是如何执行此操作:

import time

import pandas as pd

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)

driver.get("https://www.viewbase.com/funding")
time.sleep(5)
headers = driver.find_elements_by_xpath('//*[@class="tablesorter-headerRow"][2]/th/div')
table = driver.find_element_by_xpath('//*[@id="inverse_swap"]')

columns = [i.text for i in headers]
data = [r.split() for r in table.text.split('\n')]

df = pd.DataFrame(data, columns=columns)
df.to_csv("data.csv", index=False)

输出: 在此处输入图片说明

你也想要标题吗

driver.get("https://www.viewbase.com/funding")
tbl = driver.find_element_by_tag_name('table')
headers = tbl.find_elements_by_xpath('//thead/tr[2]/th')
headers = [item.text.strip() for item in headers]
trs = tbl.find_elements_by_xpath('tbody/tr')
lst=[]

for tr in trs:
    tds = tr.find_elements_by_tag_name('td')
    tds = [item.text.strip() for item in tds]
    lst.append([item for item in tds if item])

#print(lst)
print(headers)
df = pd.DataFrame(lst)
print(df)

到目前为止的输出

['', 'Binance', 'FTX', 'Okex', 'Bybit', 'Binance', 'Huobi', 'Okex', 'Bitmex', 'Bybit', '']
        0         1         2         3  ...         6         7        8        9
0     BTC   0.0100%   0.0020%  -0.0062%  ...   0.0034%  -0.0047%  0.0100%  0.0100%
1     ETH   0.0100%   0.0010%  -0.0034%  ...   0.0100%   0.0111%  0.0193%  0.0100%
2     XRP   0.0100%   0.0013%   0.0003%  ...  -0.0096%  -0.0089%  0.0100%  0.0100%
3     EOS   0.0100%   0.0017%  -0.0098%  ...   0.0100%   0.0186%        -  0.0100%
4     BCH   0.0100%   0.0006%   0.0227%  ...   0.0100%   0.0135%  0.0110%        -
5     LTC   0.0100%  -0.0012%   0.0180%  ...   0.0100%   0.0006%  0.0100%        -
6    LINK   0.0031%  -0.0004%  -0.0272%  ...  -0.0321%  -0.0221%        -        -
7     BSV         -  -0.0020%  -0.0280%  ...   0.0105%  -0.0468%        -        -
8     BNB  -0.1475%  -0.0046%         -  ...         -         -        -        -

[59 行 x 10 列]

进口

from selenium import webdriver
import pandas as pd

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM