[英]Beautifulsoup scraping table from website with requests for pandas
I am trying to download the data on this website https://coinmunity.co/ ...in order to manipulate later it in Python or Pandas I have tried to do it directly to Pandas via Requests, but did not work, using this code: 我试图在此网站https://coinmunity.co/上下载数据,以便稍后在Python或Pandas中进行操作,我尝试通过Requests将其直接发送给Pandas,但使用此方法无效码:
res = requests.get("https://coinmunity.co/")
soup = BeautifulSoup(res.content, 'lxml')
table = soup.find_all('table')[0]
dfm = pd.read_html(str(table), header = 0)
dfm = dfm[0].dropna(axis=0, thresh=4)
dfm.head()
In most of the things I tried, I could only get to the info in the headers, which seems to be the only table seen in this page by the code. 在我尝试过的大多数事情中,我只能获取标题中的信息,该标题似乎是代码在此页面中看到的唯一表。
Seeing that this did not work, I tried to do the same scraping with Requests and BeautifulSoup, but it did not work either. 看到这行不通,我尝试对Requests和BeautifulSoup进行相同的抓取,但同样行不通。 This is my code:
这是我的代码:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://coinmunity.co/")
soup = BeautifulSoup(res.content, 'lxml')
#table = soup.find_all('table')[0]
#table = soup.find_all('div', {'class':'inner-container'})
#table = soup.find_all('tbody', {'class':'_ngcontent-c0'})
#table = soup.find_all('table')[0].findAll('tr')
#table = soup.find_all('table')[0].find('tbody')#.find_all('tbody _ngcontent-c3=""')
table = soup.find_all('p', {'class':'stats change positiveSubscribers'})
You can see in the lines commented, all the things I have tried, but nothing worked. 您可以在评论的行中看到我尝试过的所有内容,但没有任何效果。 Is there any way to easily download that table to use it on Pandas/Python, in the tidiest, easier and quickest possible way?
有什么方法可以以最简洁,更简便,最快的方式轻松下载该表以在Pandas / Python上使用? Thank you
谢谢
Since the content is loaded dynamically after the initial request is made, you won't be able to scrape this data with request. 由于内容是在发出初始请求后动态加载的,因此您将无法使用请求抓取此数据。 Here's what I would do instead:
这是我要做的:
from selenium import webdriver
import pandas as pd
import time
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.get("https://coinmunity.co/")
html = driver.page_source.encode('utf-8')
soup = BeautifulSoup(html, 'lxml')
results = []
for row in soup.find_all('tr')[2:]:
data = row.find_all('td')
name = data[1].find('a').text
value = data[2].find('p').text
# get the rest of the data you need about each coin here, then add it to the dictionary that you append to results
results.append({'name':name, 'value':value})
df = pd.DataFrame(results)
df.head()
name value
0 NULS 14,005
1 VEN 84,486
2 EDO 20,052
3 CLUB 1,996
4 HSR 8,433
You will need to make sure that geckodriver is installed and that it is in your PATH. 您需要确保已安装geckodriver,并且该文件位于PATH中。 I just scraped the name of each coin and the value but getting the rest of the information should be easy.
我只是刮擦了每个硬币的名称和价值,但获取其余信息应该很容易。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.