繁体   English   中英

使用 Python 从 HTML 转换为 CSV

[英]Converting from HTML to CSV using Python

我正在尝试将网站上的表格(完整的详细信息和下面的照片)转换为 CSV。 我从下面的代码开始,但表格没有返回任何内容。 我认为这一定与我不了解表格的正确命名约定有关,但任何额外的帮助将不胜感激,以实现我的最终目标。

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd

url = 'https://www.privateequityinternational.com/database/#/pei-300'

page = requests.get(url) #gets info from page
soup = BeautifulSoup(page.content,'html.parser') #parses information
table = soup.findAll('table',{'class':'au-target pux--responsive-table'}) #collecting blocks of info inside of table
table

Output: []

除了上述代码中提供的 URL 之外,我实际上是在尝试将下表(在网站上找到)转换为 CSV 文件: 在此处输入图像描述

数据通过 Ajax 从外部 URL 加载。 您可以使用requests / json模块来获取它:

import json
import requests


url = 'https://ra.pei.blaize.io/api/v1/institutions/pei-300s?count=25&start=0'
data = requests.get(url).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for item in data['data']:
    print('{:<5} {:<30} {}'.format(item['id'], item['name'], item['headquarters']))

印刷:

5611  Blackstone                     New York, United States
5579  The Carlyle Group              Washington DC, United States
5586  KKR                            New York, United States
6701  TPG                            Fort Worth, United States
5591  Warburg Pincus                 New York, United States
1801  NB Alternatives                New York, United States
6457  CVC Capital Partners           Luxembourg, Luxembourg
6477  EQT                            Stockholm, Sweden
6361  Advent International           Boston, United States
8411  Vista Equity Partners          Austin, United States
6571  Leonard Green & Partners       Los Angeles, United States
6782  Cinven                         London, United Kingdom
6389  Bain Capital                   Boston, United States
8096  Apollo Global Management       New York, United States
8759  Thoma Bravo                    San Francisco, United States
7597  Insight Partners               New York, United States
867   BlackRock                      New York, United States
5471  General Atlantic               New York, United States
6639  Permira Advisers               London, United Kingdom
5903  Brookfield Asset Management    Toronto, Canada
6473  EnCap Investments              Houston, United States
6497  Francisco Partners             San Francisco, United States
6960  Platinum Equity                Beverly Hills, United States
16331 Hillhouse Capital Group        Hong Kong, Hong Kong
5595  Partners Group                 Baar-Zug, Switzerland

和 selenium 版本:

from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
import time
driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')
url = 'https://www.privateequityinternational.com/database/#/pei-300'

driver.get(url) #gets info from page
time.sleep(5)
page = driver.page_source
driver.close()
soup = BeautifulSoup(page,'html.parser') #parses information
table = soup.select_one('table.au-target.pux--responsive-table') #collecting blocks of info inside of table
dfs = pd.read_html(table.prettify())
df = pd.concat(dfs)
df.to_csv('file.csv')
print(df.head(25))

印刷:

    Ranking                         Name            City, Country (HQ)
0         1                   Blackstone       New York, United States
1         2            The Carlyle Group  Washington DC, United States
2         3                          KKR       New York, United States
3         4                          TPG     Fort Worth, United States
4         5               Warburg Pincus       New York, United States
5         6              NB Alternatives       New York, United States
6         7         CVC Capital Partners        Luxembourg, Luxembourg
7         8                          EQT             Stockholm, Sweden
8         9         Advent International         Boston, United States
9        10        Vista Equity Partners         Austin, United States
10       11     Leonard Green & Partners    Los Angeles, United States
11       12                       Cinven        London, United Kingdom
12       13                 Bain Capital         Boston, United States
13       14     Apollo Global Management       New York, United States
14       15                  Thoma Bravo  San Francisco, United States
15       16             Insight Partners       New York, United States
16       17                    BlackRock       New York, United States
17       18             General Atlantic       New York, United States
18       19             Permira Advisers        London, United Kingdom
19       20  Brookfield Asset Management               Toronto, Canada
20       21            EnCap Investments        Houston, United States
21       22           Francisco Partners  San Francisco, United States
22       23              Platinum Equity  Beverly Hills, United States
23       24      Hillhouse Capital Group          Hong Kong, Hong Kong
24       25               Partners Group         Baar-Zug, Switzerland

并将数据保存到文件中。 file.csv

注意你需要seleniumgeckodriver并且在此代码中 geckodriver 设置为从c:/program/geckodriver.exe导入

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM