简体   繁体   English

网站上的Beautifulsoup刮桌,要求提供熊猫

[英]Beautifulsoup scraping table from website with requests for pandas

I am trying to download the data on this website https://coinmunity.co/ ...in order to manipulate later it in Python or Pandas I have tried to do it directly to Pandas via Requests, but did not work, using this code: 我试图在此网站https://coinmunity.co/上下载数据,以便稍后在Python或Pandas中进行操作,我尝试通过Requests将其直接发送给Pandas,但使用此方法无效码:

res = requests.get("https://coinmunity.co/")
soup = BeautifulSoup(res.content, 'lxml')
table = soup.find_all('table')[0]
dfm = pd.read_html(str(table), header = 0)
dfm = dfm[0].dropna(axis=0, thresh=4)
dfm.head()

In most of the things I tried, I could only get to the info in the headers, which seems to be the only table seen in this page by the code. 在我尝试过的大多数事情中,我只能获取标题中的信息,该标题似乎是代码在此页面中看到的唯一表。

Seeing that this did not work, I tried to do the same scraping with Requests and BeautifulSoup, but it did not work either. 看到这行不通,我尝试对Requests和BeautifulSoup进行相同的抓取,但同样行不通。 This is my code: 这是我的代码:

import requests
from bs4 import BeautifulSoup

res = requests.get("https://coinmunity.co/")
soup = BeautifulSoup(res.content, 'lxml')
#table = soup.find_all('table')[0]
#table = soup.find_all('div', {'class':'inner-container'})
#table = soup.find_all('tbody', {'class':'_ngcontent-c0'})
#table = soup.find_all('table')[0].findAll('tr')
#table = soup.find_all('table')[0].find('tbody')#.find_all('tbody _ngcontent-c3=""')
table = soup.find_all('p', {'class':'stats change positiveSubscribers'})

You can see in the lines commented, all the things I have tried, but nothing worked. 您可以在评论的行中看到我尝试过的所有内容,但没有任何效果。 Is there any way to easily download that table to use it on Pandas/Python, in the tidiest, easier and quickest possible way? 有什么方法可以以最简洁,更简便,最快的方式轻松下载该表以在Pandas / Python上使用? Thank you 谢谢

Since the content is loaded dynamically after the initial request is made, you won't be able to scrape this data with request. 由于内容是在发出初始请求后动态加载的,因此您将无法使用请求抓取此数据。 Here's what I would do instead: 这是我要做的:

from selenium import webdriver
import pandas as pd
import time
from bs4 import BeautifulSoup

driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.get("https://coinmunity.co/")

html = driver.page_source.encode('utf-8')

soup = BeautifulSoup(html, 'lxml')

results = []
for row in soup.find_all('tr')[2:]:
    data = row.find_all('td')
    name = data[1].find('a').text
    value = data[2].find('p').text
    # get the rest of the data you need about each coin here, then add it to the dictionary that you append to results
    results.append({'name':name, 'value':value})

df = pd.DataFrame(results)

df.head()

name    value
0   NULS    14,005
1   VEN 84,486
2   EDO 20,052
3   CLUB    1,996
4   HSR 8,433

You will need to make sure that geckodriver is installed and that it is in your PATH. 您需要确保已安装geckodriver,并且该文件位于PATH中。 I just scraped the name of each coin and the value but getting the rest of the information should be easy. 我只是刮擦了每个硬币的名称和价值,但获取其余信息应该很容易。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM