[英]How To Scrape Table Into Dataframe From The Webpage
我正在尝试用一页将表格抓取到数据框中。
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.viewbase.com/funding")
soup = BeautifulSoup(res.content,'lxml')
table1 = soup.find_all('tr')
该表是通过 JS 脚本填充的,因此 BS4 不会看到它。 但是,您可以在headless
模式下使用selenium
并获取所需内容。
以下是如何执行此操作:
import time
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get("https://www.viewbase.com/funding")
time.sleep(5)
headers = driver.find_elements_by_xpath('//*[@class="tablesorter-headerRow"][2]/th/div')
table = driver.find_element_by_xpath('//*[@id="inverse_swap"]')
columns = [i.text for i in headers]
data = [r.split() for r in table.text.split('\n')]
df = pd.DataFrame(data, columns=columns)
df.to_csv("data.csv", index=False)
你也想要标题吗
driver.get("https://www.viewbase.com/funding")
tbl = driver.find_element_by_tag_name('table')
headers = tbl.find_elements_by_xpath('//thead/tr[2]/th')
headers = [item.text.strip() for item in headers]
trs = tbl.find_elements_by_xpath('tbody/tr')
lst=[]
for tr in trs:
tds = tr.find_elements_by_tag_name('td')
tds = [item.text.strip() for item in tds]
lst.append([item for item in tds if item])
#print(lst)
print(headers)
df = pd.DataFrame(lst)
print(df)
到目前为止的输出
['', 'Binance', 'FTX', 'Okex', 'Bybit', 'Binance', 'Huobi', 'Okex', 'Bitmex', 'Bybit', '']
0 1 2 3 ... 6 7 8 9
0 BTC 0.0100% 0.0020% -0.0062% ... 0.0034% -0.0047% 0.0100% 0.0100%
1 ETH 0.0100% 0.0010% -0.0034% ... 0.0100% 0.0111% 0.0193% 0.0100%
2 XRP 0.0100% 0.0013% 0.0003% ... -0.0096% -0.0089% 0.0100% 0.0100%
3 EOS 0.0100% 0.0017% -0.0098% ... 0.0100% 0.0186% - 0.0100%
4 BCH 0.0100% 0.0006% 0.0227% ... 0.0100% 0.0135% 0.0110% -
5 LTC 0.0100% -0.0012% 0.0180% ... 0.0100% 0.0006% 0.0100% -
6 LINK 0.0031% -0.0004% -0.0272% ... -0.0321% -0.0221% - -
7 BSV - -0.0020% -0.0280% ... 0.0105% -0.0468% - -
8 BNB -0.1475% -0.0046% - ... - - - -
[59 行 x 10 列]
进口
from selenium import webdriver
import pandas as pd
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.