简体   繁体   中英

extract a specific table from web page

I want to extract the first table of this page

https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=1318605

For the second table of the page I use the id of the table

url=f'https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=1318605'
response = requests.get(url)
web = response.content
soup = BeautifulSoup(web, 'html.parser')
transaction = soup.find('table', {'id':'transaction-report'})
report = pd.read_html(str(transaction))[0]

for the first table I do not see such id easily usable How to extract the first one?

Try:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = "https://www.sec.gov/cgi-bin/own-disp?action=getissuer&CIK=1318605"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

# select correct table
table = soup.select_one("table:not(:has(table)):has(a:-soup-contains(Owner))")

# make first row header (to have column names in Pandas)
for td in table.tr.select("td"):
    td.name = "th"

df = pd.read_html(str(table))[0]
print(df)

Prints:

                         Owner  Filings Transaction Date                                                     Type of Owner
0                Gebbia Joseph  1834171       2022-09-25                                                          director
1              DENHOLM ROBYN M  1242782       2022-05-02                                                          director
2               Taneja Vaibhav  1771340       2022-01-05                                 officer: Chief Accounting Officer
3                    Musk Elon  1494730       2021-11-08                          director, 10 percent owner, officer: CEO
4       Ehrenpreis Ira Matthew  1412598       2021-10-27                                                          director
5             Baglino Andrew D  1790565       2020-06-05                           officer: SVP Powertrain and Energy Eng.
6             Mizuno Hiromichi  1811230       2020-04-23                                                          director
7      ELLISON LAWRENCE JOSEPH   901999       2020-02-14                                                          director
8             Kirkhorn Zachary  1771364       2019-03-13                                  officer: Chief Financial Officer
9     Wilson-Thompson Kathleen  1331680       2018-12-27                                                          director
10           MORTON DAVID H JR  1476070       2018-08-06                                 officer: Chief Accounting Officer
11          FIELD JOHN DOUGLAS  1650649       2017-11-02                                   officer: Senior VP, Engineering
12                 McNeill Jon  1670512       2017-08-14                              officer: President, WW Sales/Service
13          RICE LINDA JOHNSON  1188735       2017-07-17                                                          director
14             MURDOCH JAMES R  1420590       2017-07-17                                                          director
15              Branderiz Eric  1352816       2016-10-24                             officer: VP, Chief Accounting Officer
16             Jason Wheeler S  1660228       2015-11-30  officer: Chief Financial Officer, other: Chief Financial Officer
17             Reichow Gregory  1584531       2015-11-06                                         officer: VP Manufacturing
18            Guillen Jerome M  1584518       2015-07-15                                 officer: VP Service and Sales Ops
19              Kroeger Harald  1565080       2012-12-12                                                          director
20             WHITAKER ERIC S  1234046       2010-10-28                                          officer: General Counsel
21          Blankenship George  1503210       2010-10-06                                       other: VP Sales and Service
22         Jurvetson Stephen T  1314917       2010-06-28                                                          director
23                 Buss Brad W  1336664       2010-06-28                                                          director
24          Straubel Jeffrey B  1494727       2010-06-28                                 officer: Chief Technology Officer
25               Walker John K  1494729       2010-06-28                               officer: VP, No. Amer. Sales & Mktg
26                 Musk Kimbal  1494731       2010-06-28                                                          director
27                Ahuja Deepak  1494732       2010-06-28                                  officer: Chief Financial Officer
28              Passin Gilbert  1494806       2010-06-28                            officer: Vice President, Manufacturing
29              Kohler Herbert  1495013       2010-06-28                                                          director
30          Gracias Antonio J.  1495158       2010-06-28                                                          director
31  Al Darmaki H.E. Ahmed Saif  1495205       2010-06-28                                                          director

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM